Skip to main content

Malware analysis - an introduction

· 16 min read

Malware analysis requires a structured approach to extract and understand hidden threats within various file formats and languages. By using the right tools, malware analysts can effectively deobfuscate, detect, and mitigate malicious payloads before they cause harm.

The following are some notes from the Udemy (italian) course: Introduzione alla malware analysis: Un approccio pratico.

Environment for malware analysis

Malware refers to any software program or code specifically designed to exploit, damage, disrupt, or gain unauthorized access to computer systems, networks, or data. Malware operates covertly, often without user consent.

Here are some blog posts for further reading:

A secure and controlled environment is crucial for effective malware analysis, often utilizing virtual machines, sandboxing tools, and forensic utilities to prevent system compromise and facilitate safe examination.

Host requirements

  • Virtualization software like Virtual Box with an installed operating system suited to the target malware under analysis,
  • Windows 10 media creation tool: https://www.microsoft.com/it-it/software-download/windows10,
  • When using a shared folder between the host and guest, always set it to read-only to prevent malware from writing files to the host system,
  • Ensure to allocate an appropriate number of processor cores and memory, as some malware perform environment checks to detect virtualized environments and may terminate execution if the resources appear insufficient or unrealistic.
warning

In some situations, malware can exploit vulnerabilities in virtualization software, potentially escaping the virtual environment and compromising the host system.

Guest requirements

Install the operating system required for the malware analysis environment.

Common analysis tools:

  • Editor tool of your choice, example Notepad++,
  • Python, high-level programming language,
  • 7-zip, open source software for compression/decompression of various file formats,
  • CFF explorer, binary editor - tool used to analyse and modify files in binary format,
  • System Informer, tool that helps you monitor system resources, debug software and detect malware,
  • Detect it easy, program for determining types of files,
  • Yara, used to identity malware,
  • Sysinternals, set of utilities for troubleshooting and diagnosing Windows and Linux systems and applications,
  • Fiddler proxy, HTTP/HTTPS system proxy, used to inspect network requests. To inspect traffic over TLS ensure to enable the correct setting. Tools->Options->HTTPS: Check Decrypt HTTPs traffic, Install certificate, Check "Ignore server certificate errors",
  • Wireshark, tool for inspecting network traffic across various protocols beyond just HTTP/HTTPS.
  • (optional) Windows Defender Remover, a tool which is uses to remove Windows Defender.

It is recommended to create virtual machine snapshots to easily restore the system to a clean state after each analysis, examples:

  • vanilla - a clean, fresh OS installation
  • ready-to-play - OS + analysis tools

Windows - things to know

  • Windows Registry: system-defined database in which applications and system components store and retrieve configuration data. It can be accessed using WIN+R and typing regedit:
    • administrator rights are required to modify the registry. Malware often targets HKEY_CURRENT_USER, HKEY_USERS, and HKEY_LOCAL_MACHINE keys,
    • malware leverages the registry for persistence, such as configuring automatic execution at start-up. For example, HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\RunOnce can be used to launch malware on boot (more information here).
    • each user has a unique identifier within the Windows Registry,
    • the Registry is a valuable resource in forensic analysis to assess system state and detect anomalies,
    • before executing malware, exporting Registry values can be useful for comparison after execution to identify changes made by the malware,
  • when executed, malware typically follows an installation routine to avoid easy deletion from directories like  Downloads. It often installs itself in the AppData folder (%APPDATA% - a Windows environment variable)  to maintain persistence,
  • Windows Shortcut (.lnk) are often used as an attack vector to run malicious code.
tip

When analysing malware, disable updates and Windows Defender, then add the user folder to the exclusion list to prevent automatic detection and removal of the malware during the analysis process. Disable Hide settings and show extensions.

Threat Hunting and Malware Triage

Threat Hunting is a proactive cybersecurity process involving the systematic search for threats. More here.

Below are some key resources commonly used by analysts:

  • MalwareBazaar is a platform for sharing and analysing malware samples. Malware files are typically compressed and password-protected to prevent accidental execution and reduce the risk of unintended infections,
  • Virus Total: is an antivirus aggregator that scans files and URLs using multiple security engines to detect threats. A subscription is recommended for advanced features like deeper analysis, historical data, and threat intelligence insights,
  • Hybrid Analysis: free malware analysis service for the community that detects and analyses unknown threats.

Phases of Malware Triage

  1. Identify file type - Determine the file format to understand how it operates and what tools are needed for analysis:
    • use Detect it easy: analyse the file and add extensions if applicable,
    • utilize CFF explorer: examine the file structure and metadata,
  2. Perform preliminary analysis - Use analysis tools to gather initial insights into the file’s behaviour and characteristics:
    • identify the hash: Compute the file hash and search in malware databases to determine if it is a known or new threat,
    • perform string analysis: Use strings.exe to extract readable text from the file,
    • search for filenames: if possible, identify the original filename and research it online for additional context,
    • use sandbox analysis: execute the file in a controlled environment to observe behaviour and extract further information,
  3. Identify the next steps (if necessary):
    • for example, identify configuration or Command and Control (C&C) Server.

Analysing Malicious Files

Attackers frequently deliver malicious files through spam emails and other attack vectors. This section covers methods for analysing different file types commonly used in malware distribution.

.xls files

Microsoft Excel files (.xls, .xlsm) can contain macros, which attackers use to execute malicious code. Attackers can easily embed macros in malicious documents using the following methods:

  • Right-click MenuAssign Macro...
  • ViewMacrosView Macros
  • Worksheet Tab* → Insert (to add macros)
  • Worksheet TabView Code (opens Microsoft Visual Basic for Applications - VBA)

Attackers often protect macro access with passwords, making direct extraction difficult. Alternative methods are required to extract and analyse embedded macros.

Extracting Macros with oletools

oletools is a Python toolkit for analysing Microsoft OLE2 files (used in legacy Office formats like .doc, .xls, .ppt). Install it with:

pip install oletools

Key oletools Commands:

  • olemeta <filename> - extracts (standard) metadata, useful for threat hunting and identifying similar malicious samples,
  • oleid <filename> - detect specific characteristics usually found in malicious files, such as macros,
  • olevba <filename> - extracts macro source code for analysis.

Analysing Extracted Macros

Malware macros are often obfuscated to hide their true functionality. For example, during the analysis look for:

  • obfuscated strings hiding URLs or commands,
  • download-and-execute functions, such as: rundll32.exe URL,Function.
note

Since 2023, Microsoft blocks automatic macro execution for files downloaded from the web, reducing but not eliminating the threat.

.pdf files

PDFs are structured documents that can contain hidden malicious code. While they can be analyzed with a text editor, reading them directly is often challenging due to compression and encoding mechanisms. peepdf-3 is a powerful Python tool designed to simplify PDF analysis, allowing analysts to extract JavaScript code, embedded objects, and suspicious URLs efficiently.

Install it with:

pip install peepdf-3

Basic Commands

Open a PDF in interactive mode: peepdf -i <filename>. Look for suspicious objects and streams, as they may contain embedded JavaScript/JS objects or external URLs (e.g., phishing links or download sites)

Use interactive mode:

> help # list the possible commands
> object <object_number>

Attackers may use URL obfuscation to make a malicious domain look legitimate:

  • scheme://[user:password@]host[:port]/path[?query][#fragment],
  • example: https://docs.google.comformasda....@maliousdomain.com/

An analyst may believe that the link leads to Google Docs, but it actually redirects to a malicious site that initiates the malware execution chain.

.iso, .img, .lnk

  • .iso / .img files are used as containers for additional malicious files (.dll). Users may not notice .dll file presence if they have Show hidden files, folders, and drives option disabled.
  • .lnk shortcut files to execute malware: check the file properties → Target to identify the command being executed. For example, C:\Windows\system32\rundll32.exe malicious.dll, malicious_fun

Attackers can hijack a legitimate DLL (DLL Hijacking), injecting malicious functions while retaining original functionality. DLL files expose functions through the export table, making it possible to identify added malicious functions. You can use CFF explorer to start analysing a DLL file.

.msi (Microsoft Installer) Files

.msi files can be misused to install malware. Attackers often embed .dll payloads inside .msi packages.

Install Orca.exe via Windows SDK Installer (select "MSI Tools" during installation). Localize Orca-x86_en-us.msi and install it. To analyse an .msi file with Orca.exe, simply drag and drop the file into the application.

Inspect tables such as:

  • File Table → Lists all embedded files (e.g., .dll, .exe),
  • CustomAction Table → may reveals execution commands (e.g., launching rundll32.exe malicious.dll, malicious_fun).

You can also extract files from .msi using 7-Zip. You can also find .cab files inside .msi packages and can be used as hidden containers for malicious payloads.

Script-based Malware

Multi-Stage Unpacking - Malware often executes in stages, requiring step-by-step unpacking to reveal the final payload.

PowerShell

PowerShell (.ps1 scripts) is a powerful scripting language commonly used in the early stages of malware attacks for automation, system reconnaissance, and payload execution.

By default, PowerShell restricts script execution from third-party sources. To enable it, use: Set-ExecutionPolicy RemoteSigned.

Possible tools for analysis:

  • Visual Studio Code + PowerShell Extension: ideal for debugging and script analysis,
  • PowerShell ISE: a built-in tool for script execution and debugging.
tip

(Optional) Depending on the malware under analysis: disabling network access in the virtual machine can help prevent accidental internet requests, unauthorized data exfiltration, or command-and-control communication.

Malicious PowerShell scripts often use Windows API - functions like:

DllImport("kernel32.dll")
VirtualAlloc(...)

Scripts may execute other threads or process. You can use System Informer to inspect active processes, read and dump process memory for further analysis. For example, you can load memory dumps into a binary analysis tool like CFF explorer, which includes a "Quick Disassembler" for low-level inspection.

PowerShell-based malware is often heavily obfuscated, using techniques such variable and function renaming, reversing or encoding strings. To improve readability, deobfuscation is recommended. Since doing this manually can be tedious and time-consuming, consider using deobfuscation tools or writing a simple script to automate the process.

JavaScript, JScript and VBScript

JavaScript usually runs within a sandboxed browser environment and does not have direct system privileges, unless browser vulnerabilities are exploited. On Windows, Windows Script Host provides an environment in which users can execute scripts in various languages that use various object models to perform tasks.

A JavaScript malware can be debugged using: Visual Studio + Windows Script Host. In Visual Studio settings you can set the debugger: cscript.exe //X $(ItemPath)

Useful websites:

VBScript is a deprecated programming language for scripting on Microsoft Windows using Component Object Model, based on classic Visual Basic and Active Scripting. (Wiki, ActiveXObject).

.hta files executed via the mshta.exe utility, can often used as script containers for malware. Learn more: Red Canary - mshta Attack Technique.

Portable Executable (PE) Format

Portable Executable (PE) format is a file format for executables used in Windows operating systems, it’s based on the COFF file format (Common Object File Format). A PE file is a data structure that holds information necessary for the OS loader to be able to load that executable into memory and execute it. PE file extensions include .exe, .dll, .scr, and .sys. More details here in the Microsoft documentation.

When analysing PE binaries, consider the differences between managed and unmanaged languages, little endian vs big endian when interpreting hex data, and memory mapping, as file offsets and in-memory addresses often differ. System Informer can be used to analyse memory.

Some articles for further reading:

Below are the main sections commonly found in the PE format:

  • DOS Header
    • e_magic: signature MZ (indicates a valid DOS executable),
    • e_lfanew: offset where the NT Header starts,
  • NT Header
    • File Header, contains general information about the PE file, including the target architecture,
    • Optional Header (despite the name, it’s required for executables):
      • Magic: Identifies whether the file is 32-bit (0x10B) or 64-bit (0x20B),
      • AddressOfEntryPoint: Entry point for execution (where execution starts),
      • ImageBase: Preferred memory address where the file should be loaded,
      • DllCharacteristics: If ASLR is enabled, the DLL can be relocated in memory, impacting malware analysis,
      • Data Directories: pointers to key structures such as the Import Table, Export Table, and Resource Table,
  • Section Headers
    • .text: contains executable code,
    • .rdata,.data: store read-only and writable data, such as strings and variables,
  • Import Directory: lists DLLs and functions the executable depends on. Example:
    • kernel32.dll: common Windows API functions,
    • IsDebuggerPresent: often used by malware to detect if it’s running in a debugging environment.
tip

In CFF explorer , go to the Optional Header section, then click on the DllCharacteristics row, DLL can move→ enable/disable ASLR option.

.NET

Machine code disassembly cannot process a managed language compiled executable because it is not native machine code. Instead, it must be disassembled from the intermediate language using an appropriate disassembler. Managed files are easier to decompile because they contain additional metadata, such as symbol names, class structures, and method definitions.

  • dnSpyEx is a debugger and .NET assembly editor. Go to entry point and start analyse it,
  • de4dot is a .NET deobfuscator and unpacker written in C#. Here the compiled binaries: de4dot-built-binaries,
  • ildasm.exe - is an intermediate language disassembler, docs. Usage examples:
    • ildasm file.exe /out:file.il
    • ilasm file.il /out:file_m.exe

While inspecting .NET code, look for some of the following instructions:

# Can be used to load second-stage payloads in memory 
Assembly.Load( ... )
# function from kernel32.dll
# allocate memory with execution rights
VirtualAlloc(...)

System Informer has a .NET Assemblies tab, which can be used to inspect loaded .NET assemblies within a process. This feature allows to detect suspicious or injected .NET assemblies, which may indicate malware activity.

shed (.NET Runtime Inspector), dnSpyEx, or System Informer can be used to analyse .NET processes and extract loaded binaries.

Unmanaged code

Requirements: knowledge of Assembly language, how stack and heap works, Windows calling convention

  • x64dbg open-source x64/x32 debugger for windows,
    • To emulate what a malware does we can load rundll32.exe and pass the DLL under analysis to the command (Option → Settings → User DLL Load),
  • PE-sieveis a tool that helps to detect malware running on the system, as well as to collect the potentially malicious material for further analysis. Recognizes and dumps variety of implants within the scanned process: replaced/injected PEs, shellcodes, hooks, and other in-memory patches,
  • Ghidra is a software reverse engineering (SRE) framework,
  • Other alternatives: IDA Pro, Binary Ninja, radare2, iced-rs.

YARA rules

YARA is a powerful tool designed primarily for malware researchers to identify and classify malware samples. It allows users to create rules that describe malware families (or any other artifacts) using textual or binary patterns. Each rule consists of a set of strings and a boolean expression that defines its detection logic.

Suggestions for crafting an effective YARA rule:

  • identify meaningful strings or format strings within the malware,
  • look for imported libraries and function names that indicate malicious behaviour,
  • consider obfuscation techniques, malware may encode or manipulate critical strings to evade detection.

Beware of false positives and false negatives when creating YARA rules. While false positives can lead to unnecessary alerts, false negatives are more dangerous, as they allow malware to go undetected. Test YARA rules extensively. Examples of YARA rules can be found on the documentation.

Running YARA from the command-line

yara64 <rule file> <binary to analyse> | <PID>

UNPACME is an automated malware unpacking service and it can be used for YARA development, testing, and hunting.

Malware Analysis Report

Malware analysis reports provide critical insights into a threat, serving as the key output of the analysis process. A report should help assess impact and understand the threat.

A well-structured report combines strategic and technical details, covering the malware’s nature, operators, targets, and in-depth analysis of its functions, payloads, and behaviours.

Examples of malware reports:

Vocabulary, Tips and Resources

  • malware sandbox is a virtual environment used to isolate and analyse the behaviour of potentially malicious software. It execute a file and trace all the operations that are performed,
  • Indicators of Compromise (IOCs) are evidence left behind by an attacker or malware that can be used to identify a security incident. Common examples include file hashes, IP addresses, domain names, or registry changes. They are often included in analysis reports to support detection and response efforts;
  • Command and Control (C&C) Server is a server controlled by an attacker to is used to deliver malware, issue commands, exfiltrate data, or coordinate further attacks on targeted systems,
  • Packer: utility used to compress and obfuscate files, making them more difficult to analyse. Malware often leverages packers to evade detection by antivirus software,
  • AsyncRAT is a Remote Access Tool (RAT) designed to remotely monitor and control other computers through a secure encrypted connection,
  • Quackbot is a famous banking trojan, more here,
  • Autostart Extension Points (ASEP) are commonly used by malware as persistence mechanisms and define a starting point for the malware,
  • Sysinternals:
    • Autoruns64.exe, useful to list and reports auto-start services and many other things,
    • string.exe usage example: string.exe -n 20 file.exe, list all string of 20 character in the executable,
  • ret42/RE-Thing - list of reverse eng. tools,
  • ired.team notes: miscellaneous-reversing-forensics,
  • reverse-engineering-cheat-sheets.

Software Obfuscation Techniques

Attackers use obfuscation techniques to evade detection and analysis. Common transformations include:

  • String Encoding – Hiding commands, URLs, and payloads.
  • Control Flow Flattening – Making code execution paths difficult to follow.
  • Packing and Encryption – Wrapping malicious code inside additional layers to avoid detection.

For a detailed breakdown of obfuscation methods, refer to:
Tigress - Software Obfuscation Transformations

ATT&CK Matrix

MITRE ATT&CK is a globally-accessible knowledge base of adversary tactics and techniques based on real-world observations.