This is linking post to article I co-wrote with Databricks.
while back I was researching another avenue of attacks on Databricks. In that process I ended up with a “side quest” discovery; this discovery was a strange set of default behavior, which most of the organizations would end up using when they followed the default provisioning wizard and who would ever use the defaults. The finding was not specific to Azure as cloud provider, but was initially researched by me via Azure, as that is the platform I use most often for security research – In other words, this finding was related to the product, not to the cloud provider.
After disclosing these findings, I was introduced to Databricks security team, who made a very high impression on me. To this day I have not met a more proactive or knowledgeable product security team. We agreed on a ~90 days disclosure timeline to give adequate time for mitigations and changes to the product.
The attack enabled non-privileged users to gain full access of a privileged user to the Databricks workspace, by intercepting the control plane traffic of privileged users. Depending on the cluster use, the compromised access would contain various privileged permissions and items that were bound to the particular Databricks instance.
Before Databricks deployed mitigations you could simply use the following tcpdump and grep pattern to get tokens of more privileged users running on the same default cluster.
%sh sudo tcpdump -i lo tcp port 6061 -Aq | grep "token"
⚠️ This pattern still works for any previously provisioned default (standard) clusters, luckily mitigations are available described below under “ Protection Flags “
Based on these findings Databricks made changes in three categories:
|New Provisioning workflow||The new provisioning workflow defaults to single user cluster which prevents the behavior highlighted described in ‘Attack’|
|UX changes and more secure defaults||The former standard (default) cluster is now called “No isolation Shared” cluster. Documentation now discourages the use of the previously default cluster mode. To create the previously default cluster mode you have to deliberately remove a bunch of ”guardrails” by confirming options in UX.|
|Protection flags||Perhaps the biggest change is to mitigate this malicious behavior in existing and new clusters. Enabling this flag will detect that there is an admin user interacting with the cluster, and shall thus prevent leakage of the API token on a shared channel with possibly malicious users interacting on the same cluster.|
Timeline of disclosure:
- June – Initial submission shortly after meeting with MSRC
- July – September – Various comms and testing proposed mitigations
- October – Mitigations deployed
Researcher credits/shout-out: Secureworks, MSRC & MS Adversary Tradecraft Group – Nixu, DataBlinc