DarkHotel is a cyber espionage campaign targeting well-known corporate executives and political leaders in Malaysia, Japan, India, and other countries. What is fascinating about this particular syndicate is their advanced skill set, and ability to leverage high-level penetration techniques to accomplish their goal (ex. kernel mode key logger, reverse engineering certs, and 0-day exploits).
In addition, after successful exfiltration of the targeted data, they are able to remove any trace of their existence from the network, making it much harder for security professionals to put the pieces of the puzzle back together.
This blog is a deep dive into mining domains associated with the DarkHotel attackers, and an attempt to extract any patterns in their behavior. Some of the techniques we employed were mining WHOIS records, GeoIP data, ASN/Organization data, and performing natural language text processing on the actual DarkHotel domains themselves.
DarkHotel attacks usually begin by compromising a hotel’s Wi-Fi network and targeting the victim by tricking them into downloading/installing a backdoor. For example, they may send a phishing email targeting an executive, saying their current version of Adobe Flash or anti-virus software needs to be upgraded.
Thinking about this from an attacker perspective, we would need to somehow advertise the link to click on to the victim. How would we go about doing this? Well, in general, people always like to get stuff for free, so we may include the word “free” or “cheap” in the domain. Additionally, less-tech savvy business execs or politicians are not aware of these means of infection, and are more likely to click on a link saying they need to update their software, improve security, or possibly read some sort of business or news site. Let’s take a look at some examples:
Typically, malicious domains fall into the pattern of using common “abuse” words, which is why we decided to use natural language text processing techniques for this experiment. One of the techniques we used to analyze these domains was to extract all the words found in the English dictionary and try to find any commonalities. We also leveraged a natural language processing technique from the Python NLTK library called stemming to help increase accuracy when extracting the most important words from a text.
Stemming is a normalization technique which extracts the root of the word, and subjects it to a series of transformations such as stripping out common prefixes or suffixes. To give an example, the word “countries” would become “countri” after stemming. One of the reasons we chose stemming was to get the most accurate evaluation of a domain, and to lower the false negative rate.
For example, when we compared the DarkHotel domains against domain examples found in the Mandiant APT 1 report, there were domains such as applesoftupdate[.]com and webserviceupdate[.]com. Without stemming, if we had just taken the dictionary word “update”, we would have missed domains like firefoxupdata[.]com. Here are some of the results after extracting the top dictionary words out of the DarkHotel domains list:
auto: 75 occurrences
(Also worth mentioning: 10 occurrences of “autoupdate”)
What is interesting to note is that both APT and DarkHotel domains share common “themes” and obfuscation techniques. Some of these themes can be found in the report under the category “naming themes” (p. 48). Both sets of domains seem to share common themes of containing words related to “news” and “technology”. For example, some news related domains found among DarkHotel domains are:
Here are some of the domains associated with news found in Mandiant’s report:
This make sense, as it is more likely that business executives would be interested in clicking on news links to read while browsing online in their room. DarkHotel domains also use methods of attaching themselves to well-known technology/software related companies to make them seem more legitimate to the victim. Here are some examples found in both sets trying to attach themselves with established tech companies:
APT 1 domains:
Similarly, spoofing of security products/domains occurs in both sets: in the APT 1 report: symanteconline[.]net, mcafeepaying[.]com, and in DarkHotel domains: secureonline[.]net, checkingvirusscan[.]com. Basically, they are targeting the victim’s lack of tech knowledge—advertising to upgrade the security of their system would obviously be a priority for a high-level executive or politician.
Another feature of DarkHotel domains that we analyzed were their ASN (Autonomous System Number). We found that a lot of these domains come from obscure ASNs/registrars that were previously associated with domains exhibiting malicious behavior. Generally, malicious domains like these are associated with lesser-known ASNs/Orgs, and have very limited restrictions on abuse. Here are the top 10 ASNs the DarkHotel domains were associated with:
One example of a potential abuse-detection mechanism we found when researching DarkHotel domain/ASN mappings is the domain microsoft-xupdate[.]info. If this domain really was a Microsoft update tool, wouldn’t it make sense for it to come from an ASN or Registrar associated with Microsoft, like ASN 8075, Microsoft Corporation? However, it’s associated with ASN 21740, eNom Incorporation, whose rank is 6135—which raises some suspicion.
Similar logic can be applied to adobeplugs[.]net which comes from ASN 3388, LeaseWeb, or adobeupdates[.]com from ASN 53665, Bodis. Why would it not be associated with Adobe Systems, whose ASN is 1313? These inconsistencies definitely lead to more suspicion about these domains.
We also used GeoIP/MaxMind to do country code lookups on the DarkHotel domains, and here are some of our top results:
United States: 113
It is interesting to see that several of them are from the US, since most of the reported attacks are from East Asia. Generally though, these country stats are pretty consistent with APT behavior, a lot of which resolve to locations in the U.S. Here is the geographic breakdown:
We also mined another data set associated with these DarkHotel domains—their WHOIS records. Interestingly, we found that many of the domains have been updated very recently. A majority of the domains have been updated in the last few months, and some within the last few weeks. This is generally not the case when dealing with legitimate domains (ex. Alexa Top 1000) that very rarely update their domain registration records.
This is another indicator of abuse, as attackers have to constantly evolve to stay ahead of detection mechanisms. Another curious fact: although the domains were updated recently, many of these domains were created quite some time ago. Here are some our domain results with particularly old creation dates and very recent modification dates:
Domains: microchsse[.]strangled[.]net, www[.]strangled[.]net, automobiles[.]strangled[.]net
Creation time: 9/24/1999
Update time: 9/9/2014
Domains: microblo5[.]mooo[.]com, microchisk[.]mooo[.]com
Creation time: 3/24/2000
Update time: 3/8/2014
Creation time: 6/1/2001
Update time: 8/15/2014
Creation time: 3/12/2000
Update time: 2/11/2014
Creation time: 8/1/2000
Update time: 8/13/2014
Several of these domain creation dates span back over 10 years with some dating as far back as 1999. This suggests that the DarkHotel campaign may in fact have been going on for longer than we expected. It’s intriguing to see these old creation dates alongside the recently updated information, which implies these domains are being routinely maintained. This deserves further investigation, and could potentially be used as another detection mechanism.
After analyzing these results, we have come to the conclusion that there are certain patterns in DarkHotel domain data, and that threat actors are replicating proven tactics. The next step would be to program these findings into a feature detection algorithm and apply deeper inspection of our results.
The text analysis of the domain and ASN information seemed to be the most revealing, and promising for feature detection. While the geo-diversity analysis was interesting to see, it did not reveal enough info to convert into a feature. The recently updated WHOIS records, and the fact that many of these domains have been created quite some time ago is also fascinating, but deserves deeper analysis before turning into a feature.