Two weeks ago I saw on Twitter that Thomas Chopitea and Maximilian Hils of The Honeynet Project were nice enough to create an online forensics challenge. I had a Sunday afternoon free and thought I’d give it a shot. I ended up completing most of the challenge. This blog serves as a walk through for my solution.
Network Forensics
I’ve talked about my approach to network forensics before and thought, “what better time to practice what I preach?”. First I read over the challenge page including each question to gather as much contextual information as possible. I made note of the superheroes bonus question and kept it in the back of my head while working through the artifacts. The challenge provided a network trace file and a back story. The important details of the story were:
- John is your boss and his system was compromised at a BYOD conference
- Pete Galloway responded to the compromise and then angrily quit the company
- Pete found malware, Python bytecode, and “random” payloads during his triage
The first thing I typically do when analyzing network traffic is check the size of the trace file. If I can reasonably open it with Wireshark, I will. I like to use Wireshark to check two things about the trace file. The first is the number of connections broken down by protocol, which can be found by going to Statistics->Conversations. We can see about 150 TCP connections and 100 UDP connections.
The second thing is the protocols Wireshark thinks the trace contains. This can be accomplished by going to Statistics->Protocol Hierarchy. We can see most of the UDP is DNS making most of those 100 UDP connection fairly easy to understand. It should be noted that Wireshark determines the protocols by port number. This is a rather naive method and Wireshark’s listing should be taken with a grain of salt. Wireshark is a great tool for visually working with a relatively small number of streams, but I much prefer Bro. Bro isn’t as simple to use as Wireshark, though.
To get a full picture of the trace file’s contents I read the trace file with Bro and told Bro to use a custom configuration. I started by copying broctl’s default configuration file from %BRO_DIR%/bro/share/bro/site/local.bro. I enabled and disabled certain features and added an event handler for carving transferred files from the trace (and naming them with the protocol they were transferred over) and I set my local subnets to 0.0.0.0/0 (doing so makes Bro create additional log lines for every IP address in the trace file). Once finished the contents of my local.bro script looked like this:
@load misc/loaded-scripts
@load tuning/defaults
@load misc/scan
@load misc/app-stats
@load misc/detect-traceroute
@load frameworks/software/vulnerable
@load frameworks/software/version-changes
@load-sigs frameworks/signatures/detect-windows-shells
@load protocols/ftp/software
@load protocols/smtp/software
@load protocols/ssh/software
@load protocols/http/software
@load protocols/dns/detect-external-names
@load protocols/ftp/detect
@load protocols/conn/known-hosts
@load protocols/conn/known-services
@load protocols/ssl/known-certs
@load protocols/ssl/validate-certs
@load protocols/ssl/log-hostcerts-only
@load protocols/ssh/geo-data
@load protocols/ssh/detect-bruteforcing
@load protocols/ssh/interesting-hostnames
@load protocols/http/detect-sqli
@load frameworks/files/hash-all-files
export {
redef Site::local_nets += [0.0.0.0/0];
}
event file_new(f: fa_file)
{
local fname = fmt("%s_%s", f$source, f$id);
Files::add_analyzer(f, Files::ANALYZER_EXTRACT, [$extract_filename=fname]);
}
I then ran bro -Cr conference.pcapng local.bro and Bro generated a plethora of useful information I used as my inventory. The log files Bro generated for me where:
app_stats.log
dns.log
files.log
known_certs.log
known_services.log
packet_filter.log
ssl.log
x509.log
conn.log
http.log
known_hosts.log
loaded_scripts.log
notice.log
software.log
weird.log
Bro also extracted all the files transfered within the trace file and dropped them into a directory for me which I named extract_files/.
With my Bro logs as my inventory, I first checked the conn.log file to get an overview of the connections in the trace file. Looking at the “service” field in the conn log as well as the name of the log files Bro created, we can see the trace file contains mostly HTTP and DNS traffic. It’s likely just web browsing. Looking in the dns.log file, you can see all the domain names queried for in the trace file and looking in the http.log file, you can see all the hosts and URLs John browsed to. Looking at the extracted files in extract_files/ whose names begin with “HTTP_” you can see all the HTML, javascript, and images transferred to and from on John’s system.
Using the http and dns logs and the extracted files, we can answer the first question, “BYOD seems to be a very interesting topic. What did your boss do during the conference?”. It seems John got on Facebook, Reddit, Gag9, and browsed to “www.thewayoftheninja.org” quite a bit during the conference. He was likely very productive.
Looking closer at the http.log file, we can see “http://www.thewayoftheninja.org/n.html” was used as a referer in one HTTP connection, meaning John was likely redirected or linked there from “http://www.thewayoftheninja.org/n.html”. The host field for the connection is “www.harveycartel.org” and the file path John went to was “/nv2/Nv2-PC.zip”. This is interesting because it is supposedly a zip file. It is also interesting because about a minute after the zip file was downloaded an HTTP connection to “ninja-game.org” was made using a user-agent of “Python-urllib/2.7”. Recall Pete mentioned some interesting Python bytecode.
Looking at the URL of the other connections made to “ninja-game.org”, we can see Windows file paths. This is very strange. You can also see the user-agent used in connections to “ninja-game.org” is “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36”, a Mac user-agent while all the previous HTTP connections used “Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.89 Safari/537.36 “, a Windows user-agent. This lead me to believe the user-agent was crafted, likely by malware, likely inside that Zip file.
The next thing I did was check the mime types of all the files transferred in the trace file. I opened up the files.log file and noted an “application/x-dosexec” and an “application/x-shockwave-flash” file. These are both common file types often used in attacks so I honed in on them, especially the DOS executable file. But what happened to the Zip file? Bro and libmagic (the file command in Linux) both claim the file at “/nv2/Nv2-PC.zip” is an EXE even though the URL contained a Zip extension. I looked at the file with ExeInfo and the Zip file turned out to be a self extracting archive.
At this point, we can answer the second question, “What method did the attacker use to infect your boss? Which systems (i.e. IP addresses) are involved?”. The attacker served back a malicious Zip file disguised as a game and John, working so diligently at his conference, ran it on his system. The file was served from 81.166.122.238, the same IP address the malware later in the trace file beacons to. We can also answer the third question, “Based on the PCAP, which files were exfiltrated? List the filenames.”. From the URLs in our http.log file:
C:UsersadminDesktopsensitive+documents.doc
C:UsersadminDesktopToolsodbg201help.pdf
C:UsersadminDocumentsprivateaffairholidayEmiratesETicket1.pdf
C:UsersadminDocumentsprivateaffairholidayEmiratesETicket2.pdf
Based on the time the HTTP connections began we can answer question four, “Can you sketch an overview of the general actions performed by the malware?”. It seems John downloaded what he thought was just a game. Likely opened the file and ran some malware. John’s system issued a GET request for “ninja-game.org/highscores?user=admin”, likely downloaded a configuration or instructions file and then exfiltrated the above files from John’s system.
Host Forensics and Malware Analysis
This is where network forensics ends and malware analysis/host forensics is needed. Moving the self extracting Zip to a previously configured malware analysis VM and opening it with 7zip, we can see the archive contains five files and a directory.
main.exe
main.pyc
n_v14.exe
python27.dll
start.cmd
lib/
Opening start.cmd with Notepad we can see how the malware is started. It seems n_v14.exe is the game John was hoping to play while main.exe is the malware. I began by examining main.pyc. Python compiles scripts before running them and stores the compiled version in pyc files. Similar to .NET assemblies, Python bytecode can be decompiled to the original source code quite easily. Using a library called Uncompyle, I decompiled the pyc file. It contained the following:
import sys
sys.exit('rndebugger
detected.rnsignature:233f3f3b7164642c2424652c276431723a7d021e')
which answers the seventh question, “What does main.pyc do? (Bonus: Can you provide a decompiled version?)”. This makes me believe the main.exe file incorporates some anti debugging techniques. Looking at the strings in main.exe, we can see some references to Python including “Py_SetPythonHome”, “Could not load Python.”, and “import main”.
This makes me believe main.exe was originally written in Python and made into an EXE file with something similar to py2exe. It is likely main.pyc is a remnant of main.exe. This answers question six, “The malware seems to be written in Python. Is this “normal” Python? What’s different?” and also explains the Python user-agent we saw in the trace file.
Opening main.exe in the freeware version of IDA and pressing Ctrl-E to locate the entry point, we see IDA found some TLS call backs in main.exe. These are likely used to detect the presence of debuggers. This was my somewhat thin answer to question eight, “How is the final payload protected? How is it decrypted by the dropper? (Bonus: Can you provide a decompiled version?)” as I didn’t completely reverse engineer main.exe. Instead, I treated main.exe as a blackbox.
I executed main.exe with Fakenet running on my VM to attempt to recreate what happened on John’s system. My VM issued the same GET request I saw in the trace file to “ninja-game.org/highscores?user=admin”, except admin was replaced with the user account I was logged in as on my VM. Fakenet responded with its default HTML file. I was hoping to see POST requests exfiltrating data from my VM similar to the conference trace, but the malware issued no other network connections. I then replaced Fakenet’s default HTML file (which it serves to all requests by default) with the file the live server from the conference trace file responded to John’s system with and executed main.exe.
Again, I saw the GET request to “ninja-game.org/highscores?user=[USER_NAME]”, but this time I also saw POST requests to “ninja-game.org/submit_highscore?n=[FILE_PATH]” where the full path to some PDFs on my desktop replace [FILE_PATH]. I then carved the data the malware POSTed to Fakenet and compared the original PDFs on my desktop to the exfiltrated data in the malware’s POST request.
The size of the two files were the same. I assumed the file was either encoded or encrypted. Comparing the contents of the two files in HxD, I noticed that all 0x20 bytes in the original document mapped to 0x6F. I checked other bytes from the original PDF and they seemed to all be substituted in the encoded version.
Decoding John’s Exfiltrated Files
I assumed the encoded file was simply an xOR’d version of the original PDF and tried decoding the encoded version with 0x4F (0x20 ^ 0x6F), but found that xOR key wasn’t used to encode all bytes of the original file. I was a little stumped at this point. I considered other simple methods of encoding including:
- a caesarian shift
- a modulus function that depended on the input byte
- a hard coded substitution scheme
The solution then came to me. As I said previously, I treated main.exe as a blackbox and didn’t worry about the actual encoding functionality within main.exe. Using some basic Python I crafted a fake PDF and placed it on my desktop for the malware to steal and POST. My Python script follows:
with open('./crafted.pdf', 'wb') as f:
for each in range(0xff + 1):
f.write("%c" % each)
This creates a file with bytes 0x00 through 0xFF and names it with a PDF extension. I then started Fakenet and again executed main.exe. The malware issued a GET request for “ninja-game.org/highscores?user=[USER_NAME]”, Fakenet served back the original server’s response, the malware found the crafted PDF on my desktop, encoded it and included it in a POST body which was intercepted by Fakenet. Carving the encoded file POST’d by the malware from a PCAP Fakenet wrote for me, I had a copy of every byte 0x00 through 0xFF and that byte’s corresponding encoded version. I then used more Python scripting to construct a dictionary mapping encoded bytes to decoded bytes and used that dictionary to decode the encoded PDF and DOC files exfiltrated from John’s system and carved from the conference.pcapng file. The decoding script follows:
# This builds the decoding dictionary from the encoded crafted PDF
d = {}
count = 0x00
byte = 'x01'
with open('./crafted_ecoded.pdf', 'rb') as f:
while byte != "":
byte = f.read(1)
if byte == "":
continue
foo = ord(byte)
d[foo] = count
count += 1
# This decodes the a single file carved from the malware's POST requests in the original challenge pcap
byte = 1
with open('./doc_encoded.doc', 'rb') as en:
with open('./doc_decoded.doc', 'wb') as de:
while byte != "":
byte = en.read(1)
if byte == "":
continue
foo = d[ord(byte)]
de.write("%c" % foo)
After decoding John’s PDFs and DOC files, I opened them. The PDFs contained plane tickets to Dubai in John’s and Pete’s wife’s name. This is likely why Pete was so mad and quit the company. This was the answer to question nine, “Why did Pete leave the company?”. The answer to question ten, “Your boss mentioned he’s going to the Honeynet Workshop in Stavanger, but you’re not allowed to join him. Why so?” was also in the PDFs. The reservations for Dubai are for the same date as the Honeynet Workshop.
Interestingly enough, John was likely already compromised before this incident as the word document on his desktop contained a malicious macro which dropped this EXE in his TEMP folder.
Considering the bonus question, the only super heros I found were in images extracted from John’s browsing.
Finally, the somewhat subjective question five, “Do you think this is a targeted or an automated attack? Why?” can be answered. I believe the attack to be automated and not targeted. The malware seems to grab any PDF or DOC file on the Desktop or in the user’s documents folder in alphabetical order (I noticed this when it exfiltrated a temporary file I created, “a_crafted.pdf”, before “crafted.pdf”. The malware isn’t looking for anything specific besides files with specific extensions. The malware is not targeting John’s files specifically. However, depending on how the attacker delivered the malware and whether or not the malware was delivered to anyone else at the conference, the attacker *could have* been targeting John.
Final Thoughts
I had a lot of fun working this challenge and I learned quite a bit about EXE files generated from Python and thread-local storage techniques. I’m always looking for opportunities to use Bro, as well, and this was a good one. I want to thank Thomas and Maximilian for putting the challenge materials together as well as the Honeynet Project. If anyone else found the superheros, I’d love to here where.