Have you ever used Twitter, Gmail, or YouTube and noticed odd characters being displayed vertically overlay other text on the page or break out UI boundaries? If so and have wondered how this is happening, we dive into the wonderful world of Unicode that causes this behavior.
For the past few days, viewing Bleeping Computer’s Twitter profile page has started to look glitchy, with characters shooting up from another user’s display name. These strange characters made the Twitter profile page hard to ignore, and I just had to write about it.
In this particular case, the behavior was being caused by certain Unicode characters in security researcher Hylejam’s (defconisov3r) display name, as shown below.
There’s a whole “magic Unicode” list of characters that behave strangely when placed in a typical UI. This list isn’t even exhaustive list, with many characters that do not behave as expected!
It turns out multiple areas of Twitter are impacted by these magic characters, including profile pages and notifications pane.
This behavior is not only affecting Twitter. When receiving Twitter notifications in our email, we noticed that Gmail also exhibited the same behavior when displaying these characters.
Certain Unicode characters don’t render as expected
When asked for the reason behind using these characters in their username, Hylejam told BleepingComputer that it “started when I knew the story of the Telugu character (జ్ఞా) that broke Apple devices.”
“Although I did not start using those characters, if I remember correctly. After looking at the behavior [precisely triggered by] the “ە” character with letters and numbers, [I tried] to understand the bug that caused the WhatsApp crash,” Hylejam explained.
Hylejam refers to a text-rendering bug in WhatsApp that had previously caused the application to crash for users who had received a message containing a specific “black dot” emoji, as illustrated below.
The “ە” character used by Hylejam causes these glitches is the Arabic character ‘Ae’ identified by Unicode+06D5. He shared an older text copy of his Twitter description that has the character too and breaks the UI boundaries on Pastebin s well.
Note, Arabic is one of the languages which is written right-to-left, which means the cursor would align towards the right-hand side when typing, and the backspace key would have to be hit in the opposite direction than is familiar to English users.
Hylejam stated that he had made a bug report to Twitter via HackerOne, but Twitter dismissed it as they asked for a proof of concept. Unfortunately, Hylejam could not provide that due to a lack of complete understanding behind what exactly causes the glitch.
“In short, the editions were not carried out, that is, if I cut the image using the application when sending it via DM or sharing it in a tweet, it was not modified. Regarding emojis, [hitting] a backspace key did not work correctly. If you tried to delete something in between, something [else] was deleted from the end of the text,” Hylejam told us.
In addition to the usual struggle when attempting deletions using backspace and editing text with these funky characters, we observed that the result is the characters rendering outside of the perceived bounds of a UI or altering a pre-existing text within the rendered fields.
For example, only this week user Loseshape posted on Reddit how they were able to “reverse” the characters of chat messages being sent to them by simply putting a right-to-left override Unicode character in their YouTube name:
“I put a right to left override character in my name and now any automated messages that are sent to me look like the one circled,” read the post.
The user further explained: “I wrote Cat4-7 backwards and put a right to left override character in the front.”
Another Reddit user AntisocialWeeb shared in the same thread a list of these mysterious Unicode characters, which includes the “right-to-left override” character triggering this behavior.
While becoming common in Twitter profile names, reports of select characters causing unintended actions are nothing new.
In 2015, Mozilla fixed a UI rendering bug, which broke emoji colors due to the presence of a Unicode character.
However, given the Unicode set’s vastness, which is continually expanding, it is hard to keep track of what precisely will cause an application to malfunction.
According to multiple users on StackOverflow, Unicode character space can accommodate over 1.1 million characters, and as of today, only about 10% of this space has been allocated.
Adding localization and multi-language support to your app is an excellent and necessary idea to tailor your services to a global audience.
It is also equally important to be aware that different scripts and character-sets will render differently and could cause unexpected outcomes or and break things.