E2E-Encryption for Roam Research - Part 1
Roam Research recently added a big new feature, end-to-end encryption for the content blocks and the uploaded media. In this article, we take a look into why encryption is needed at all.
Hej,
And welcome to this series of articles about a recent feature Roam Research has released this weekend: end-to-end encryption. I will explore why encryption is so important, how it works, how secure it is (as far as I can judge at the moment), how you encrypt your graph, how you verify that the encryption is working, and last but not most minor, whether the encryption has an impact on performance or not.
Why is encryption so important?
Let’s start with a short explanation of how Roam Research stores your data. All the blocks you write or import are stored in an indexed database in your browser, and if you are using a remote graph, this will be periodically synced to Roam Research’s servers.
When you upload media, like pdf, images, videos, or other files, they will be uploaded to Google’s firebase storage servers using a randomized, ten characters long filename. The extension will stay the same. The file has no other protection. If an attacker knows or guesses the URL, he can access the uploaded file. Furthermore, if he knows the URL, he can also get the metadata containing the original file name (which could help quickly find important stuff in a big bunch of files, e.g., “invoice.pdf,” “proposal.doc”).
If you want to check this, upload a file in your Graph and check the URL:
The URL is composed of the following components (%2F equals /
)
https://firebasestorage.googleapis.com/v0/b/firescript-577a2.appspot.com/o/imgs/app/
This is the same for all Roam Research users.
The next part is the name of the graph:
decrypted/
Followed by the randomized filename and its standard file extension:
vd7kqSrFQN.mp4
Then two parameters follow:
alt=media
indicates that we want to get the actual media file and not the metadata and finally the token:
token=8d6b309b-3fc1-45da-b6c8-07d0fb43fe27
The token is generated on upload and unique for this file. Usually, it would be used for enforcing access control only for authorized users, but Roam Research seems to have disabled these rules to allow access without the token. This reduces security because randomly guessing the 36 character/128 bit UUID would take a lot of tries. I am not sure why Roam Research has removed this security layer, but this has no significant impact because of the randomized file names, as we will see later.
The only thing you need to access the file is the name of the graph and the randomized filename.
There are five essential thread vectors on your data:
Someone can read the network traffic (public or semi-public WLAN/LAN, e.g., in a hotel, cafe, conference center). It’s enough to know what URLs you are connecting to.
There is no need to intercept the HTTPS encryption between your computer and the servers.Update: Here I have been corrected, the part behind the domain name is encrypted. Therefore, the HTTPS encryption would have to be broken, for example, by an SSL proxy (not unusual in companies), so that the full URLs could be viewed.Someone being able to get access to your browser data (e.g., a public computer in a library, an internet café, administrators in your company)
Someone attacking your computer getting your browser data
Someone attacking Roam Research server infrastructure getting your graph data
Someone is just guessing URLs
Let’s take a quick dive into each of these threads.
Intercept the URLs
Update: I have been corrected, the part behind the domain name is encrypted. Therefore, this would have to be broken, for example, by an SSL proxy, so that the full URLs could be viewed.
Whenever you load a Roam Page with an embedded file, your browser will request its URL. This request will be transported to the router/switch you are connected to. The router/switch has to read it to get you the necessary data. While these data will probably be encrypted between your browser and the server, the URL itself is not.
So if a malicious person collected firebase storage URL for some time in a public/semi-public network, you would get a pretty exciting list of files from Roam users you can automatically download. Then get the metadata of these files using a simple script and look for interesting names (e.g., "proposal,” “invoice,” “medical record,” “nude.”)
You could reduce this risk by using a VPN provider in public/semi-public networks (but you must trust the VPN Provider).
Access to browser data
When you use your browser, many data will be stored on the computer the browser runs on. Roam Research is using an indexed Database which content can be found in the application data directories (e.g., /Users/alex/Library/Application\ Support/Google/Chrome/Default/IndexedDB/
).
Once you find the directory, getting firebase storage URLs from such a database is easy and don’t require any special tools or knowledge:
Again, the files behind these URLs are now accessible without any other protection. You can also read all the blocks in the graph because their content is stored in cleartext.
Therefore, it is always good to use incognito mode on public computers or clear the browser cache/data when leaving.
Someone attacking your computer
If someone is attacking your computer, you are probably in big trouble, and whether Roam Research did additional security measures will not make a difference. A trojan, for example, could record URLs you visit (first thread), could investigate your browser’s data (second thread), or log the passwords you enter (even the one you have taken for encryption).
There are a few measures you can take to prevent this scenario: regular update your system, don’t trust every software, use restricted accounts, maybe use some protection software.
Someone attacking Roam Research’s infrastructure
This again is a worst-case scenario, and the outcome depends on how reasonable the security measures of Roam Research are. If you look at some of the significant breaches that happened in the last years, many huge companies were in trouble: Adobe, Dropbox, Facebook, MySpace, Wattpad, to name a few.
If an attacker gains access to the databases of the Roam Research Users, we have something like the second thread. Without encryption, they can read your notes, search for URLs and download the files. They most probably get your account information, maybe your passwords (hopefully encrypted/salted), and they could be able to record your password input in cleartext (depends on the chosen architecture).
The best way to prevent this scenario would be to use local storage only.
Someone guessing URLs
As I explained above, the URLs to access the uploaded files share a typical pattern. There are only two unknown parts for an attacker—the name of the graph and the random file name. The graph name can be guessed (most probably, many graphs will have a common name); you accidentally exposed it by a screenshot or a video or shared a URL.
An attacker would then have to brute-force through the possible file names. There are a lot of combinations, even with just ten characters. We have 26 lowercase, 26 uppercase, and ten numbers. That generates a whopping 839.299.365.868.340.224 (62^10) possible combinations. This has to be multiplied by the different file extensions you are looking for (approximately something between 10 to 50). You will get speedy feedback from googles servers if a file does not exist (404).
But even if you were able to fire 10,000 requests a second and neither Google nor Roam Research cut your connection because of this unusual traffic pattern, you would need more than 25 million years to iterate through all names for each graph. So this attack pattern is not very practicable.
Let me know in the comments if I have missed an attack thread.
Encryption to the rescue
Now that we know the different threads, we can examine whether encryption will help.
In Part 2, I will explain how encryption in Roam Research is implemented (or at least what I think, how it’s done), how you could encrypt your graph, and verify that it is working. After that, we will take an in-depth look in Part 3 into the performance impacts this might cause.
If you have any questions or suggestions, please leave a comment.
If you want to support my work, you can do this by becoming a paid member:
Or you can buy me a coffee ☕️. Thank you so much for your attention and participation.