Short and concise stories, for software engineers.

Join 1k other busy engineers

Stay current with a weekly email of bite sized software engineering stories.

How I found (and fixed) a vulnerability in Python

How I found (and fixed) a vulnerability in Python
Photo by Meritt Thomas / Unsplash

Following research done by James Kettle on web cache poisoning, I decided to deepen my knowledge in this field and explore these vulnerabilities in the open source domain. I focused my research on the most popular web frameworks, such as Flask, Bottle, and Tornado. I couldn't imagine that this research would end up in me fixing a security vulnerability in Python 3.9.

But wait - let's start at the beginning. As part of my research, I set up local instances of these frameworks so I can try to exploit them. Many of them were deemed vulnerable, but the Tornado one caught my attention. It was because Tornado’s maintainer told me that they were using Python’s standard library for parsing the URL.

Python’s source code

When I looked at Python’s source code, it became clear to me that the vulnerability was much more critical and profound than I thought it was - all packages that used Python’s standard library were vulnerable.

The urlparse module treated semicolon as a separator - whereas most proxies only took ampersands as separators. That meant that when the attacker could separate query parameters using a semicolon (;), they could have caused a difference in the interpretation of the request between the proxy (running with default configuration) and the server, resulting in malicious requests being cached as safe ones.

Exploitation example

GET /?link=;link='><t>alert(1)</script> HTTP/1.1


Upgrade-Insecure-Requests: 1		

User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,imag e/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9 Accept-Encoding: gzip, deflate			

Accept-Language: en-US,en;q=0.9 Connection: close	

urlparse saw 3 parameters here: link, utm_content, and then link again. On the other hand, the proxy considered this full string: 1;link='><t>alert(1)</script> as the value of utm_content, which is why the cache key would have only contained .

I immediately contacted the Python security team and opened a bug ticket. I also created a pull request on the CPython repository. It took about a month of going back and forth with the PR, during which I have learned to adhere to Python’s contributors’ rules - and it got merged 🎉 on Feb 15 and released on Feb 19. The fix was backported to older versions of Python as well.

The moral of the story is to always strive to dig deeper. You think you found something interesting? challenge your hypothesis and think about the root cause, try to find it further along the chain, which might lead to even more fascinating results.