What Is a Homograph Attack and How to Protect Yourself Against It?

Security news website The Register recently wrote about a new zero-day vulnerability that could allow what is known as a 'homograph attack,' and Shaun Nichols, the report's author, was quite surprised and a bit disappointed by the fact that in 2020, this sort of threat is still present. To find out why this is the case, we need to look into what a homograph attack is and how it has affected people over the years.

Table of Contents

What is a homograph attack?

As is often the case in the infosec world, explaining the terminology isn't as straightforward as you'd expect.

From a linguistic perspective, homographs are words that have identical spelling, but different meaning. An example of a pair of homographs would be "bat" as in a piece of sports equipment and "bat" as in a flying mammal.

In the infosec context, a homograph attack means tricking a user into visiting a malicious domain name that looks identical (or near-identical) to a legitimate domain. For instance, a hacker sends you a spam message that uses social engineering to fool you into clicking on "exаmple.com" (spelled with a Cyrillic "а") instead of "example.com" (spelled with Latin letters only). You think that you are headed to a completely legitimate website, but you end up on an attacker-controlled page that can, among other things, spread fake news, infect your system with malware, or trick you into giving away your credentials.

The confusion comes from the fact that words spelled with a mixture of Latin and Cyrillic letters don't really have any meaning and can't be classified as "homographs." A homograph attack relies on the similar appearance of two (or more) different characters (in our example, the Cyrillic "а" and the Latin "a"), and in linguistics, these characters are called "homoglyphs," not "homographs."

It's unclear why the security experts still haven't adopted the correct terminology, but instead of getting bogged down in semantics, we might as well accept the reality and see how long people have had to deal with this threat.

The history of homograph attacks

Up until the beginning of the century, people who registered internet domain names were restricted to ASCII characters only, which was a problem for users in countries that don't use the Latin script or have a more complicated alphabet. In 2003, the Internet Corporation for Assigned Names and Numbers (ICANN), the organization that is in charge of ensuring the internet's stable operation, introduced a mechanism that allowed the use of most Unicode characters in domain names.

This solved one problem – many more people were suddenly able to create domains in their native languages. The introduction of Unicode into domain names also created a couple of new issues, though.

The internet's Domain Name System (DNS), the naming system that links a domain to a website, was designed to work with ASCII strings only, and scientists had to figure out how to make it interpret Unicode. Punycode, an encoding system that turns Unicode characters into ASCII strings, was used to solve this problem. As you might have guessed already, the other issue internationalized domain names created was the homograph attack.

Shortly after applications adopted the new system, security researchers started demonstrating how homoglyphs can be used to create some pretty convincing-looking domain names. In-the-wild attacks appeared as well, which meant that browser vendors and domain name registrars had to look for ways of stopping the threat. Registering a domain with a mixture of characters from different scripts became a lot more difficult, and browsers like Google Chrome started showing the Punycode interpretation of such domains in the address bar.

The mitigation factors meant that the attack never really took off, but a zero-day vulnerability recently discovered by Matt Hamilton, a security researcher working for Soluble, shows that the threat has not disappeared.

Homograph attacks are still possible

Hamilton found out that with a clever use of some Unicode characters, he can register domains that look remarkably similar to the website names of some popular services. He focused on three homoglyph pairs:

"ɡ" – voiced velar stop (a.k.a. voiced velar plosive), a character from the International Phonetic Alphabet which looks identical to a lowercase "g" in many fonts
"ɑ" – Latin Alpha which looks similar to a regular lowercase "a"
"ɩ" – Latin Iota which can be confused with a lowercase "l" in some fonts

Using a combination of these homoglyphs, Hamilton tried to register some domains that could theoretically be used in homograph attacks against users of popular e-commerce websites, financial institutions, news outlets, and a variety of online services. He notified the registrars about the bug, but he was disappointed to receive little response from the people in charge of fixing the problem. He continued with his experiment, however, and he found out that between 2017 and 2019, some of the domains he was trying to register had been used. There were even SSL certificates issued for some of them.

This prompted him to re-classify the vulnerability as a zero-day, and he once again contacted the registrars. This time, the US-CERT (Computer Emergency Response Team) was involved, and ultimately the problem was addressed. Amazon removed the possibility of using homoglyphs in subdomains, and Verisign, the authoritative registry for the world's most popular TLDs (.com, .net, etc.), implemented changes that prevent the registration of domains with the homoglyphs outlined above.

It's curious that large organizations like ICANN and Verisign are still struggling to completely eliminate a threat we've known about for so long. It must be said that the risk for regular users is not that huge. Homograph attacks are much more likely to be used in targeted attacks aimed at high-value victims rather than Joe and Joanne Average. This doesn't mean, however, that you should underestimate the threat.

As usual, a bit of attention to detail and some extra caution can make all the difference. Try not to follow random links, and if you do need to click on them, take the extra second to have a good look and ensure that everything is as it should be.

By Duran

March 9, 2020

Computer Security