Recently there has been discussion of crafting malicious URLs by making use of the soft hyphen character. The soft hyphen character is only meant to be rendered if and when the text breaks onto a new line, which is almost never the case with URLs. The problem is not so much a security risk on an individual level, rather by incorporating the ­ character in URLs, it allows some spam catching software to be bypassed.
I think the real problem this issue highlights is that it is still unsafe in 2010 to trust website links. This issue actually reminded me of the Unicode URL attack which came to light in 2005, where it was possible to register a domain that looked like a common domain using different characters. This soft hyphen attack could allow for some of these malicious Unicode domains to be treated as legitimate.
Perhaps the first step is to educate people about SSL certificates, and have them check. But it isn’t enough that people simply check that their domain is trusted, as it can be easy to get a domain automatically trusted by most browsers. Instead, we would have to educate and get users to examine the certificate details for every important site they visit. This is unlikely, and since it shifts responsibility to the user, not so great a solution.
An easy solution may be to have a very restrictive set of characters allowed for URLs. At present a domain with soft hyphens encoded within appears as a normal domain in Firefox 4.06b, Opera 10.62, IE 9 and Chrome 6.0.472.63. This could be easily solved by forcibly rendering the soft hyphen character or in some way indicating the URL contains special characters. Likewise there should be an indicator when a URL combines different character sets.
These types of simple exploits will continue because there is just so much to work with and security has not been considered until too late. Browsers (and any internet aware program) should be designed with security in mind from the ground up, in which case they would have implemented something like a restricted character set or warning, and both the soft hyphen exploit and Unicode attack would not have been possible.