If you have been near Twitter the last couple of hours, you have probably noticed the huge amount of strange tweets. And if you moved your mouse over it, all kinds weird things started to happen - everything from innocent popups, to suddenly retweeting the message, to being redirected to external "adult-oriented" sites.
Twitter has now fixed their site, but the problem wasn't specific to Twitter's website - any web app using Twitter is potentially affected. Anything from websites listing tweets, to widgets that you add to your blog.
Most of these works just fine, and was never affected, but every single web developer, using Twitter, needs to check their code.
Also none of the Twitter apps (desktop, iPhone, iPad, Android) apps were affected, because they are not "web" based. Instead they just outputted gibberish into your stream.
It's simple. When you use the Twitter API, you get a XML or JSON output with the tweet as clear text. Here is one of the many examples.
Note: This one was the one responsible for most of the retweeting going on (not harmful, but really annoying). There were many others, some much worse than this.
Every web app will then try to find any link in the text, and convert that into something you can actually click on. It is a very simple operation, done all the time, pretty much everywhere.
In the above case, the problem was with the quotation mark, but it could be other things too.
The "safe" characters are generally (when converting raw text to links): a-zA-Z0-9;/?:@&=+$,-_.!~*() Anything else isn't part of the link.
Note: For developers, a regex like this (https?|ftp|file)://[a-zA-Z0-9;/?:@&=+$,-_.!~*()]+ works. This is the one I use for all my twitter apps.
Most web apps actually do this the right way, e.g.,
Seesmic Web didn't have the problem, because they had done it right way to begin with.
It is really up to each individual developer. Every web app is vulnerable by default. It's your job as a developer to make sure you are not affected.
Twitter has solved the problem with their site, but in a rather curious way. Instead solving the matching algorithm, they are now simply converting " into "e;. It works, but it is not really the right way to do it.
Update: Video of the exploit in action (via Sophos)
Founder, media analyst, author, and publisher. Follow on Twitter
"Thomas Baekdal is one of Scandinavia's most sought-after experts in the digitization of media companies. He has made himself known for his analysis of how digitization has changed the way we consume media."
Swedish business magazine, Resumé