E2E-Encryption for Roam Research - Part 2
Roam Research recently added a significant new feature, end-to-end encryption for the content blocks and the uploaded media. In this article, we take a look into how encryption is implemented.
Hej,
And welcome to this series of articles about a recent feature Roam Research has released this weekend: end-to-end encryption. I have already explored why encryption is so important. Now we talk about how it works, how secure it is (as far as I can judge at the moment), how you encrypt your graph, how you verify that the encryption is working, and last but not most minor, whether the encryption has an impact on performance or not.
How does the encryption work?
As far as I know, Roam Research uses Penumbra to do their encryption (if you analyze their js files closely, you know why I assume this). If they use the default configuration, the encryption algorithm would be AES256-GCM, an excellent choice because of its security and speed.
Roam Research does not explain their algorithm in detail yet, but you can read a few things about their implementation here.
As far as they say (and this is also the idea transcend.io propagates when open-sourcing their algorithm), the encryption password never travels through the wire, which would be a good security praxis. So the encrypted content is loaded to the client and decrypted only there. Roam Research does not need to store your encryption key on their server or the client (and I hope they don't do this either).
As Josh explains in this video, and as you can see yourself in your browser’s developer tools, they don’t encrypt all the block meta-data, but the most critical part - the content you put in it - is encrypted. So while the graph in your computer memory is decrypted once you started Roam and entered your key, every transaction persisted is encrypted.
Whenever you upload a file, the content will be encrypted first using your personal key and then uploaded to firebase storage with an additional .enc file extension. As you can see later in the videos, I used two different files to evaluate what exactly is happening. A bigger video file (about 80 MB) and a tiny text file (“Hello World!”).
If you access the file within Roam Research, it will be transparently decrypted for you and can be viewed and even downloaded. You only get the encrypted file if you access the URL outside Roam. It would be interesting to see what is needed to decrypt such a file outside Roam Research (maybe OpenSSL executable + encryption key + salt is already enough?). Hopefully, the developers will give us some more insight here.
So let us now look into how this encryption might mitigate the threads I’ve outlined in the first article.
Intercept the URLs
Because all the URLs will download only encrypted files, it is now a lot less risky if a malicious person gets them, either by intercepting network traffic, guessing it, or accidentally sharing them. Without the key, he has just a bunch of bytes.
To decrypt it, he would need to guess your key, so the security is as strong as the password you have chosen. The salt that most probably is used in combination with the password seemed to be stored within the metadata of the encrypted files but is different for every file.
Be aware that the metadata still contains the original file name, which can be sensitive. So choose your file names wisely.
Access to browser data
If the implementation is done well, using a public computer should be no significant risk anymore. The critical content is only stored encrypted in the local indexed DB, so even if someone gets his hands on the database files, he can’t look into it. And even if someone records all the URLs you visited, he can get the files you uploaded but can’t decrypt them.
Someone attacking your computer
This is still a significant risk. While recording URLs you visit or accessing your browser’s data won’t harm you much with the encrypted content, a key logger will render all your security measures instantly obsolete.
Someone attacking Roam Research’s infrastructure
The encryption reduces this risk (if implemented well). Because Roam Research does not know your encryption key, gaining access to their (or Google’s,) infrastructure won’t reveal your data. They may have the files or your blocks, but they can’t decrypt them.
If they can replace Roam Research Applications code, they might record your encryption password on the client-side, but this is a very special scenario.
Someone guessing URLs
The risk of guessing URLs is already low with unencrypted files. The encryption adds another layer and dramatically increases the time needed to get to your content by brute force.
How to encrypt your graph
This is pretty easy. Create a new graph, and tick the “Encrypted” Check Box. Choose a secure password, or even better, let your password manager choose one.
You are ready to go!
Please be aware that there are a few restrictions when using encrypted graphs.
If you already have a graph, you must export it into EDN and re-import it into a newly created encrypted graph. Be aware that your already uploaded attachments won’t be encrypted that way. Either wait until someone makes a solution (e.g., a script) for this or re-upload your sensitive content and delete the old ones.
How to make sure your encryption works
The most straightforward way to ensure that the encryption is working is by uploading a file (pdf, movie, text), copying the URL from the newly created block, and opening this in a new browser window or tab.
The file will load when your encryption works, but nothing useful will be shown.
To ensure your database is encrypted you can use the method explained above using your browser’s developer tools or searching for strings in your database.
Conclusion
Encryption helps increase the security of sensitive data a lot if implemented well.
Please be aware that my observations are no security audit (maybe Roam Research will commission one?) by any means, so don’t rely too much on it. Much more effort has to be invested to make sure that the overall architecture is really safe. If security is a massive issue for you, maybe your data does not belong in any cloud service at all.
In Part 3, we will analyze how encryption affects performance. And even my famous 10,000-pages graph will be used again to better illustrate this 😂
If you have any questions or suggestions, please leave a comment.
If you want to support my work, you can do this by becoming a paid member:
Or you can buy me a coffee ☕️. Thank you so much for your attention and participation.
Thanks!