news
August 11, 2023by Dov Lerner

The cybercrime underground is far less Russian and more segmented than you probably think

What can linguistic analysis teach us about the cybercrime underground?

While there is no way to precisely determine an underground user’s location, since underground sites and protocols are anonymous by design, we can use language as a rough stand-in for where a user may be located. In this research, we analyzed the languages of over 118.3 million posts that Cybersixgill collected from underground forums since the beginning of 2022.

Cybercriminals, and specifically, the deep and dark web, have a reputation for being predominantly Russian – and to a lesser extent, Chinese. However, this stereotype might be way off. We discovered that English is the overwhelmingly dominant language, accounting for 78% of posts. Russian is a distant second, at 6%, and Chinese is third, at 5%. (The “other” category, accounting for 9%, includes posts that could not be classified.)

top languages of underground forumsKeep in mind that most underground posts are not criminal-related. Underground forums are a popular venue for users to discuss white-hat hacking, cryptocurrency, gaming, technology, and politics. Furthermore, the quantity of posts does not imply the quality of cybercrime: many of the world’s most notorious cybercriminals (such as ransomware groups) are indeed Russian, but overall, Russian speakers are fewer and have less internet penetration than their English-speaking counterparts.

We checked if the Russian share of underground posts has changed in the last two years, and specifically, if Russia’s invasion of Ukraine had any effect. We discovered that while the monthly share of posts may fluctuate by several percentage points, there is no overall trend observed since the war began in February 2022.

Percentage of underground posts in RussianLinguistic segmentation of underground forums

Next, we investigated the linguistic breakdown of underground forums: Do forums feature a mix of multiple languages, or are they homogenous communities with a predominant language?

To do so, we analyzed the 25 largest underground forums (in terms of volume of posts) since the beginning of 2022. In fourteen forums, at least 95% of posts were in the same language – eight forums had at least 95% of posts in English; while in three, Russian was the dominant language; in two it was Chinese; and in one it was French. Another six forums had 90-95% of posts solely in English.

There were only five forums where the primary language accounted for fewer than 85% of the total posts.

the most-mixed forumsRussian appears prominently only twice in this grouping – once as a primary language with Slavic languages accounting for most of the “other,” and the second, as the secondary language in a predominantly English-speaking forum. Chinese does not appear significantly in mixed forums at all.

These findings are quite fascinating because they go against conventional thinking. The cybercriminal underground is one of the most borderless realms possible. Its protection of anonymity enables criminals to communicate effortlessly across geographies (and modern computer-powered translation could theoretically bridge any linguistic barriers).

However, despite the financial opportunities presented by international criminal collaboration, we find that on the underground, actors prefer to interact with their own. Perhaps this is because it is still more convenient to converse and conduct commerce in one’s native language. Or maybe, it is reflective of something deeper in the human psyche. Either way, it shows that the deep and dark web are each hardly a single community, but rather, a set of fairly distinct communities, segmented by language as their real-world equivalents.

You may also like

Pulse Blog Visual

August 19, 2024

Personalization in Cyber Threat Intelligence: Cutting Through the Noise

Read more
Ransomware Intel Module

July 29, 2024

SANS CTI Survey 2024: Threat Hunting Now the Top Use Case

Read more
SANS Report Blog-Thumbnail

July 18, 2024

SANS CTI Survey 2024: Reports Rise to the Top for Communicating Critical Information

Read more