Anonymous vs. Confidential

When conducting human subjects research, the level of risk to which subjects are exposed is an important consideration that the IRB must consider in the course of its review. It is the investigator’s responsibility to minimize the potential for harm to research subjects.

The distinction between anonymous and confidential data relates to this level of risk to subjects, and the investigator must clearly define the activity as anonymous or confidential in their IRB application.

No one activity or type of data can ever be both anonymous and confidential (though, a single research study may utilize numerous activities, or methods of data collection, some of which yield anonymous data and some of which yield confidential data).

How to Submit Application for an Anonymous Survey

If the survey is anonymous, it is eligible for expedited review and may be submitted via the abbreviated anonymous survey application in Mentor. Expedited reviews for anonymous surveys are conducted on a rolling basis; you do not need to wait for a regular IRB meeting date. Anonymous survey submissions must include: an application and copy of the survey instrument. All forms and instructions for these applications are found on the Mentor IRB website.

As part of your anonymous survey application you will be asked how your instrument will be administered. If the survey will be conducted electronically, Clark’s web-based survey software, Qualtrics, must be used. (It is acceptable to use non-Qualtrics software for anonymous surveys, in general. However, those surveys are not eligible for the abbreviated anonymous survey application.) If your methodology requires functionality that is not available in Qualtrics, ITS Staff may assist in evaluating the availability of the feature and alternative survey solutions. When collecting web data there are unique challenges to ensuring respondent anonymity; ITS staff will help ensure Qualtrics is configured in accordance with IRB standard protocol. To determine if Qualtrics will support your research methodology, visit the Survey Support section of the ITS web site.

Anonymous data

Anonymous refers to data that can in no way be linked to information that could potentially be used to identify or trace a specific subject. When an investigator promises anonymity, even the investigator cannot link the research data collected with the individual from whom it was collected. Even a reasonable probability that someone could guess the identity of a research subject means that the data are not anonymous.

Example: Surveys are often conducted anonymously, particularly when completed online. However, even conducting a survey online does not necessarily guarantee anonymity. Computers have IP (internet protocol) addresses by which a user may be identified. The investigator, therefore, must ensure s/he uses survey software that does not track IP addresses or must turn this tracking off in the survey settings.

It is commonly believed that data is anonymous if the investigator has not collected direct identifiers, such as name, Social Security number or student ID number. It should be understood, however, that indirect and demographic variables, such as age, race or sex, could, in some circumstances, be used to identify subjects, particularly when a number of them are being collected together.

Therefore, if the investigator finds it necessary for their research to collect specific identifiers, they should collect only the identifiers necessary for the research objectives. In addition, the anonymity of the data may be invalidated due to small sample size and/or a sample that is not diverse. If precautions are not taken, it may be difficult to conceal the identity of the subjects and relatively easy to link the subjects to their data

Example: An investigator collects the religious affiliation, sex, grade level and academic major of a sample of 400 students. The sample includes one Christian female freshman philosophy major.
Example: An investigator collects the marital status of 50 recent college graduates. The sample includes one widow/widower.
Example: An investigator collects the income level (in ranges) of 100 recent high school graduates. The sample includes one graduate earning between $100,000 and $150,000.

In all of these cases, the study is not anonymous because the data collected can be linked to specific individuals.

Anonymous data collection involves the lowest level of risk or potential for harm to the subjects because it is not possible to determine who provided the data.

Confidential data

Confidential refers to private information that a subject discloses with the expectation that it will not be divulged to others without that subject’s permission. When an investigator promises confidentiality, the subject is asked to supply information that could potentially identify that subject, which is then linked to the research data collected from the subject with the understanding that the investigator will not disclose the information to others outside of those for whom the subject has given the investigator explicit consent to share (i.e., the research team).

Example: Surveys and interviews conducted in-person are considered confidential rather than anonymous, as the investigator can identify the subjects, even if the investigator has collected no other identifying information.

It is important to note the use of the terms divulge and disclose, as they point to an important aspect of confidentiality that the investigator must always keep in mind — that privacy cannot be guaranteed. While the investigator may promise not to share the subjects’ private information, it may still be discoverable by outside parties. When dealing with confidential information, then, the investigator must ensure the information is collected and stored in such a way as to minimize discovery by outside parties.

Example: One-to-one interviews that are conducted in a public place may be overheard by others.
Example: The investigator stores identifiable data on their computer unencrypted. The computer is left unattended and found by an outside party who traces the data to the subjects.

Confidential data collection involves a higher level of risk or potential for harm to the subjects than does anonymous data collection. It should be noted that there are multiple levels of possible risk in confidential data collection and storage.

Example: The fewer the number of individuals who have access to the data, the lower the level of risk. Focus groups involve a higher level of risk than do one-to-one interviews, as the subjects must rely on the ability of all other subjects as well as the investigator to maintain the confidentiality of the information shared.
Example: The more securely the data is stored, the lower the level of risk. Encrypted computers are more secure than locked file cabinets, and encrypted servers are more secure than personal computers.

Protecting Confidentiality

Confidential Data

Example: Surveys and interviews conducted in-person are considered confidential rather than anonymous, as the investigator can identify the subjects, even if the investigator has collected no other identifying information.

Example: One-to-one interviews that are conducted in a public place may be overheard by others.
Example: The investigator stores identifiable data on their computer unencrypted. The computer is left unattended and found by an outside party who traces the data to the subjects.

Example: The fewer the number of individuals who have access to the data, the lower the level of risk. Focus groups involve a higher level of risk than do one-to-one interviews, as the subjects must rely on the ability of all other subjects as well as the investigator to maintain the confidentiality of the information shared.
Example: The more securely the data is stored, the lower the level of risk. Encrypted computers are more secure than locked file cabinets, and encrypted servers are more secure than personal computers.

Protecting Confidentiality

Prior to a subject’s participation in research, they must be told whether their involvement and the data collected will be anonymous or confidential. If the data are to remain confidential, it is also important for the investigator to discuss with the subject during the process of informed consent the level of confidentiality that can be offered and the potential for breach of confidentiality. This should, as well, be noted on the informed consent form.

Note that there are times when breaking confidentiality may be required. Investigators who are mandated reporters, for example, must disclose to subjects that they are legally obligated to report suspected child or elder abuse, or if the participant or others are in immediate risk of harm. Possibilities of this type must be noted in the consent form, as applicable to each project.

An important consideration in the use of confidential data is the investigator’s responsibility to keep the data as safe and secure as possible. The investigator can do this in a variety of ways (note the following list is in no way exhaustive):

Limit access to the data to as few individuals as possible.
Code the data whenever feasible.
Store hard copies of the data in locked cabinets in locked rooms.
Store the data, master code list and informed consent forms in separate locations.
Transfer (from person to person, place to place) the data (field notes, recorded interviews, informed consent forms) promptly and securely.
Transcribe recorded data as soon as possible and destroy original recordings.
Store data on an encrypted computer or server.
Upload data to an encrypted server promptly (do not wait until all data is collected).
Delete identifiers (or de-identify) as soon as is feasible.

It is imperative that investigators keep in mind at all times the potential harm (social, legal, economic, physical) to subjects that may result from a breach in confidentiality. Plans for data security must be outlined in the research protocol when discussing the provisions for managing risk, and approved by the IRB.

Coded Data

A common practice for reducing the risk of a breach of confidentiality is for the investigator to code the information and data collected from the subject. When data is coded, a subject’s identifying information is separated from the subject’s research data and replaced with a code. The investigator then keeps a “master list” of the subjects’ names and identifying codes. For security purposes, the master list is kept separately from the subjects’ data.

Example: Studies that involve interviews often utilize pseudonyms to mask subjects’ identities. In such cases, the investigator assigns each subject with a code name, which is used in all interview notes in lieu of the subject’s name. A master list linking the subjects’ names to their pseudonyms is developed to keep track of the data and is secured in a locked file cabinet. The interview notes are secured separately on an encrypted server.

De-identification

Another common practice — one that often leads to confusion between anonymous and confidential — is for the investigator (or the individual/organization from which the data originates) to de-identify the data collected from the subject. De-identification is the process by which all links between the subjects’ personally identifying information and their research data are severed and the investigator has no code by which to re-identify them. Data that will be de-identified is considered confidential data, because the researcher is able to identify the subjects prior to de-identification.

Example: Often when an investigator conducts secondary data analysis, all identifying information has been scrubbed from the data prior to the investigator receiving it. (Note that in this case, the individual/organization from whence the data originated may retain the identifying information with the data.)
Example: An investigator removes all identifying information from their research data and maintains no code with which to re-identify the data.

It is important to remember that the more indirect identifiers investigators collect, the higher the risk of re-identification. In addition, the investigator must keep in mind that there may be information available on the subjects that are unconnected from the data collected for the specific research in question and cumulatively, this information may be used to re-identify the subjects.

We acknowledge the Brandeis University IRB as the source for some of the information and examples provided above.

Limit access to the data to as few individuals as possible.
Code the data whenever feasible.
Store hard copies of the data in locked cabinets in locked rooms.
Store the data, master code list and informed consent forms in separate locations.
Transfer (from person to person, place to place) the data (field notes, recorded interviews, informed consent forms) promptly and securely.
Transcribe recorded data as soon as possible and destroy original recordings.
Store data on an encrypted computer or server.
Upload data to an encrypted server promptly (do not wait until all data is collected).
Delete identifiers (or de-identify) as soon as is feasible.

Coded Data

Example: Studies that involve interviews often utilize pseudonyms to mask subjects’ identities. In such cases, the investigator assigns each subject with a code name, which is used in all interview notes in lieu of the subject’s name. A master list linking the subjects’ names to their pseudonyms is developed to keep track of the data and is secured in a locked file cabinet. The interview notes are secured separately on an encrypted server.

De-identification

Example: Often when an investigator conducts secondary data analysis, all identifying information has been scrubbed from the data prior to the investigator receiving it. (Note that in this case, the individual/organization from whence the data originated may retain the identifying information with the data.)
Example: An investigator removes all identifying information from their research data and maintains no code with which to re-identify the data.