« country roads, take me home « view all posts
Unauthenticated RCE via SAML and XML (a presentation)

Context

There was a high-severity security incident in Zoho, ‘22: attackers could run any code on affected computers without having to log in.
More than >20 products were affected.

SAML is a foreign concept to most devs, but XML is widely known.
What’s not widely known is the features XML comes with and how they can be abused, as in this case.

The only way to truly improve security is for the actual developers to understand risks - not just “experts”, not just a security team.

This was my attempt at starting at the basics and sharing the knowledge required to truly understand what could (and did) go wrong.

Here are the slides from my talk ↓


say you want to display this on your website
say you want to display this on your website
this is what the html for that table looks like
this is what the html for that table looks like
html has both styling needed + data
html has both styling needed + data
if you had to write the data down, how would you write it?
if you had to write the data down, how would you write it?
in computers, we generally store it in a format called JSON (Javascript object notation)
in computers, we generally store it in a format called JSON (Javascript object notation)
there's also an alternate format to store data, called XML
there's also an alternate format to store data, called XML
XML comes with a friend - a complimenting format '.xslt' (style transforms) that can be used to store styling for that data
XML comes with a friend - a complimenting format '.xslt' (style transforms) that can be used to store styling for that data
If you look closely, you can see where the xslt refers to the 'user' entry in the XML
Overall, xslt picks data up from xml and tries to TRANSFORM it into a HTML
If you look closely, you can see where the xslt refers to the 'user' entry in the XML
Overall, xslt picks data up from xml and tries to TRANSFORM it into a HTML
this gives us a happy family: xml to store data, xslt to convert it into a HTML
this gives us a happy family: xml to store data, xslt to convert it into a HTML
buuttt JS is much more powerful that xslt at transformations (yay JSON?)
buuttt JS is much more powerful that xslt at transformations (yay JSON?)
but xslt said 'screw you!' and developed a plugin where you can use JS inside xslt
but xslt said 'screw you!' and developed a plugin where you can use JS inside xslt
you can see an example here, where the JS function on top is being called inside xslt below it
here it takes the date from XML and converts it to the user's timezone
you can see an example here, where the JS function on top is being called inside xslt below it
here it takes the date from XML and converts it to the user's timezone
but why top at JS? Xalan could support other languages like Java too
but why top at JS? Xalan could support other languages like Java too
What have we learnt, kids?
What have we learnt, kids?
xslt is capable of transforming XML into any other format, not only HTML - it can even transform it into JSON!
xslt is capable of transforming XML into any other format, not only HTML - it can even transform it into JSON!
So we realize that there are a lot of cool things going on in the XML ecosystem.
Transferring data is just a teeny tiny part of it.
So we realize that there are a lot of cool things going on in the XML ecosystem.
Transferring data is just a teeny tiny part of it.
let's take a quick detour and talk about some other stuff
let's take a quick detour and talk about some other stuff
a crash-course on signing
a crash-course on signing
take a minute to check this out: we can 'sign' data in XML by adding it to <SignedInfo>, and including the encrypted version of that data and a decryption key
take a minute to check this out: we can 'sign' data in XML by adding it to <SignedInfo>, and including the encrypted version of that data and a decryption key
we could sign all the original data if we want to (although that means including the same data in an encrypted format too, so 3x the size)
we could sign all the original data if we want to (although that means including the same data in an encrypted format too, so 3x the size)
...or we could just encrypt the hash of the original data
...or we could just encrypt the hash of the original data
So here are the updated steps: instead of direct comparison of original data and encrypted data, we hash the original data and compare it with the encrypted hash
So here are the updated steps: instead of direct comparison of original data and encrypted data, we hash the original data and compare it with the encrypted hash
Instead of copying the original data again inside <SignedInfo>, we can just add a reference to it
Instead of copying the original data again inside <SignedInfo>, we can just add a reference to it
a quick recap: what we want to encrypt is inside <SignedInfo>, encrypted hash of <SignedInfo> is inside <SignatureValue>. 
<KeyInfo> has info required to decrypt <SignatureValue> and check if <SignedInfo> is legit.
a quick recap: what we want to encrypt is inside <SignedInfo>, encrypted hash of <SignedInfo> is inside <SignatureValue>.
<KeyInfo> has info required to decrypt <SignatureValue> and check if <SignedInfo> is legit.
what if we want to sign not only user-details, but also last-modified? Easy, just add that reference too
what if we want to sign not only user-details, but also last-modified? Easy, just add that reference too
oh I forgot: since we want to decrypt the encrypted data with the key to check signature, we also need to include the algo used to encrypt it.
oh I forgot: since we want to decrypt the encrypted data with the key to check signature, we also need to include the algo used to encrypt it.
we have another problem: hashes are unforgiving. 
In the app, all 3 formats in the image will give the same value, '123' - but hashes will be different.
Another case: for the app, it doesn't matter if user-details comes on top or last-modified does. But again, the hash will differ.
But we want the same value to give the same hash.
we have another problem: hashes are unforgiving.
In the app, all 3 formats in the image will give the same value, '123' - but hashes will be different.
Another case: for the app, it doesn't matter if user-details comes on top or last-modified does. But again, the hash will differ.
But we want the same value to give the same hash.
So we also include the algorithm to be used to normalize data (sort/remove whitespace), and call it Canonicalization Method
So we also include the algorithm to be used to normalize data (sort/remove whitespace), and call it Canonicalization Method
HTML has comments etc too, which actually need to be ignored while hashing - because to the app it'd all be the same value
HTML has comments etc too, which actually need to be ignored while hashing - because to the app it'd all be the same value
So we specify any transformations to be done before hashing it as well, like removing comments
So we specify any transformations to be done before hashing it as well, like removing comments
A summary of the full process
A summary of the full process
it gets interesting now!
it gets interesting now!
yep
yep
yepp (from the first part)
yepp (from the first part)
dayumn sonn
dayumn sonn
let's get to the issue now
let's get to the issue now
aight aight back up
SAML is a protocol used to log in via 3rd party services (like log in with FB/log in with Google etc)
aight aight back up
SAML is a protocol used to log in via 3rd party services (like log in with FB/log in with Google etc)
SAML uses XML to transfer data between these apps and calls it a 'SAML assertion'
SAML uses XML to transfer data between these apps and calls it a 'SAML assertion'
looks all complicated, but it's basically what we saw before with lots of data.
You can see the signedInfo, data outside, transform, canonicalization method etc
looks all complicated, but it's basically what we saw before with lots of data.
You can see the signedInfo, data outside, transform, canonicalization method etc
this is what the attacker's modified XML looked like. 
They got a legit copy of the SAML XML and modified it to use Java code using Xalan code inside the <transform> part!
this is what the attacker's modified XML looked like.
They got a legit copy of the SAML XML and modified it to use Java code using Xalan code inside the <transform> part!
This is how we process the XML: we verify hashes of data in <SignedInfo>, and then verify <SignedInfo> itself using <KeyInfo>.
The problem is that by the time it got to the <KeyInfo> part, the attack was over!
The malicious code runs during verification of hashes in the first step.
Then the KeyInfo verification happens, which fails and shows an error - but the damage is already done.
This is how we process the XML: we verify hashes of data in <SignedInfo>, and then verify <SignedInfo> itself using <KeyInfo>.
The problem is that by the time it got to the <KeyInfo> part, the attack was over!
The malicious code runs during verification of hashes in the first step.
Then the KeyInfo verification happens, which fails and shows an error - but the damage is already done.
the obvious question
the obvious question
and that was the obvious fix. Check if <SignedInfo> is valid using the <SignatureValue>
and that was the obvious fix. Check if <SignedInfo> is valid using the <SignatureValue>
the obvious fix' code (this is from the library Zoho used, not Zoho's code itself). Actual code changes from that library
the obvious fix' code (this is from the library Zoho used, not Zoho's code itself). Actual code changes from that library
don't worry, there are some more footguns
don't worry, there are some more footguns
some apps use a different library to validate XML and then a different library to handle the SAML process.
If the same XML includes multiple assertions (one legit and one malicious), validation could happen in one and SAML in the other
some apps use a different library to validate XML and then a different library to handle the SAML process.
If the same XML includes multiple assertions (one legit and one malicious), validation could happen in one and SAML in the other
there's also the problem of the XML spec not helping out. For eg., this order of validation (<KeyInfo> first then <SignedInfo>) should have been in the spec/docs, but is mentioned nowhere.
there's also the problem of the XML spec not helping out. For eg., this order of validation (<KeyInfo> first then <SignedInfo>) should have been in the spec/docs, but is mentioned nowhere.
because the spec isn't clear, different libraries behave differently
because the spec isn't clear, different libraries behave differently
not to mention, just like signing there's also encryption ._.
not to mention, just like signing there's also encryption ._.
andd you can include external files as well in the XML
andd you can include external files as well in the XML
apart from this, there's also another format, DTD in the XML family
apart from this, there's also another format, DTD in the XML family
did I mention you can use variables too? Search for 'billion laughs'
did I mention you can use variables too? Search for 'billion laughs'
and tons of tiny 'features' that are waiting to bite you in your plumpy backside
and tons of tiny 'features' that are waiting to bite you in your plumpy backside
even worse: each library has different ways of handling each feature, require you to manually disable features, have different defaults...
while the average user just wants to use XML to store some data!
even worse: each library has different ways of handling each feature, require you to manually disable features, have different defaults...
while the average user just wants to use XML to store some data!
and there's that! I encourage you to put in the effort to learn things in depth and explore internals.
I hope you do - and that you share that knowledge with me too!

Don't be a skiddy ;)
and there's that! I encourage you to put in the effort to learn things in depth and explore internals.
I hope you do - and that you share that knowledge with me too!

Don't be a skiddy ;)

Want me to ping you when I write something new?
Can't decide? Ask your friendly neighborhood LLM.
starbucks
starbucks green
EXCLUSIVEREWARDS
ZEROSPAM