Data Handling Best Practices
While a lot of our work focuses on bringing human-centered approaches to privacy and security projects, we also try to incorporate privacy and security best practices in our human-centered research on a daily basis. In previous posts, we have outlined how to supplement your research with a participant bill of rights or a model release form. We also shot a video about doing research with at-risk participants. This post outlines some of our best practices in data handling for user research.
Doing data handling with privacy and security in mind means spending some time to identify different threats, culminating in a threat model, and coming up with strategies that fit the particular threat model. We’ve compiled some best practices for both risk assessment and security strategies.
Any risk assessment begins with considering what data you will be collecting for the research. Some guiding questions are:
- What is the most sensitive data you are collecting?
- Is there a copy of this data anywhere else?
- Who else knows where it is?
- Is the data easy to relate to an individual?
- Where do you store data, and who else has access to it?
- Who knows how much of the research, and when? (Think about team, organization, client, public.) Are there policies about what you can disclose, and to whom?
Next, evaluate the risks your data collection would pose to the participants. Risks can mean (but aren’t limited to):
- Risk of personal data leaking
- Risk of research content leaking
- Risk of association with this tool/research
- Risk of association with stakeholders involved
- Risk of participants knowing about each other
Write down a list of risks based on stakeholders in the research you are conducting, and consider both the severity (how bad the situation would be) and the likelihood of that situation occurring. (We are attaching an example at the end of this post.)
risk = severity x likelihood
(To read more about this specifically, check out our usable security methodology.)
Security strategies for mitigating risks
Depending on the output of your risk assessment, you will want to alter the process and tooling of your research. Here are, roughly, the strategies you might choose to mitigate the risks.
- Treat different aspects of your research differently – personal notes from your interview should be treated differently from a contact list. Knowing how sensitive different data is can help you differentiate your strategies.
- Minimise data
- Reflect on what data you actually need to complete the research, and only collect what you need. For personal information, ask yourself: does it matter how old participants are, what gender they identify with, or where they live? Do you need their email address or phone number? If you are offering a compensation, can you do that without their bank account information?
- What tools are you using to coordinate the work? What information do your tools collect? You need to assess how the tools you are using also collect data and think about how that intersects with the risks you’ve identified. Choose privacy-friendly tools for online surveys, scheduling, and click-testing (this means no tracking by default.)
- Go local
- Does your data need to live on a cloud? Do you need remote access to the data? Or can access to the data be limited in physical location? If you and your team are working from the same location, there’s no need to use cloud services to facilitate access to the data. Even if you have remote team members, they might not need direct access to the data.
- Go analogue
- Does your data need to be on a digital medium? Consider using pen and paper to document your research. In many cases, we found that it doesn’t make sense to have video or audio recordings.
- Encrypt hard drives as well as external drives for sharing, and only give out passwords to team members who need access to the data.
- Encrypt communication with participants and team members alike.
- Use pseudonyms or handles for participants from the beginning. (We like using memorable animal names.)
- Anonymize how you present the information. For example, you should redact or edit quotes that could potentially identify individuals. If you take pictures, focus on people’s hands and feet rather than their faces.
- Set up infrastructure that is easy to remove later. For example, a separate calendar and email account for each research project means it’s easy to remove the data afterwards.
- Physical security matters just as much: who has access to your office, and how easy is it to get information from your office? (Also consider “sticky note walls” as potentially leaking information to visitors.)
- Back up a copy of your most important data at a trusted location in case your office gets searched by police or other adversaries.
- Be transparent about your security decisions to all participants. Ask for their consent before you begin the research.
- Clarify what information is OK to share with whom, and when, with your client and team members, as well as participants. Where necessary, supplement it with an NDA.
- Set a data retention time period. We recommend deleting all research data 6 months after the research is done (and published).
- Schedule a “data review” regularly to see if there is any unnecessary information on your devices / your cloud.
To see how these principles interact, let’s look at a case from some of our recent research on the grantee experience with a funder. First and foremost, we wanted to protect confidentiality with our participants–organizations and businesses who are receiving funding from the funder, and were willing to speak to us about their funding relationship. Participants are putting a lot of trust in us, and we wanted to be extra sure that our research was not putting their funding relationship at risk. Based on what we knew, we drafted a simple matrix:
|Adversarial scenario||Level of risk||Reasoning|
|Funder finding out participants’ identities||HIGH||Given that we are incorporating the funder at every stage of the research, it’s conceivable that through accidentally sharing email invites or not properly redacting interview transcripts we could reveal who said what, and that may have negative repercussions for participants.|
|Participants finding out about each others’ participation||MEDIUM||Even if Simply Secure did well with anonymizing the information, it is possible that participants who know each other would be talking about this research. This information could A) get back to the funder, or B) influence participation in the research, e.g. who participates in the research, what they would tell us, and if they might try to intentionally influence the research in some way.|
|Public finding out content of the interviews||LOW||Even though the content will go through synthesis and eventually be published, we wouldn’t want sensitive issues to be disclosed to the public without the funder’s consent. However, we don’t expect people interested in this data to engage in criminal activities.|
|Governments finding out participant’s funding sources||LOW||Most participants list openly their funding sources anyway.|
We assigned pseudonyms to everyone who answered our call for participation. This list of assignments was the most sensitive piece of data and arguably more important to keep safe rather than the content of the interviews. We protected it on an encrypted external drive that only two team members had access to. The content of the interviews were in turn documented on paper, and we did the synthesis on paper in our office, which we then destroyed.
Based on the risk assessment, we decided that the channel of communication might not have to be the most secure (though we would always prefer end-to-end encrypted video chats), but the calendaring tool that details who we are talking to is more important to keep safe. We thus decided to use pseudonyms on the calendar invitations that we used, and sent out calendar files that did not link email addresses.
Data handling statement
Once you have decided on a data handling strategy, it’s time to write up a statement that you should attach to any invitation for participation. We have drafted a data handling template for you to use.
This is a rough outline of our data handling practices. Of course, trust is something that has to be earned, and merely safeguarding your research with consent forms and data handling statements isn’t enough. You earn trust by following through with these policies, and by listening to your participants and being receptive to their privacy and security needs during the entire research period.