The hallmark of this research is to propose a comprehensive multi-architectural IoT malware dataset to enhance research in malware analysis integrating both static and dynamic processes.
To accomplish this, the IoTPOT dataset was utilized where 10,000 malware samples were collected and executed in controlled virtual environments capturing the network traffic (PCAP), system traces (STRACE) and system activities (SAR) across four separate architecture including ARM, MIPS, MIPSEl and x86.
The malware families, include Mirai, Bashlite (Gafgyt), DarkNexus, Rudedevil, Agent, Generic, and Tsunami. Benign samples were generated using large language models (LLMS)
Main contributions:
The main CIC-YNU-IoTMal dataset directory contains four subdirectories, representing each of the architectures (ARM, MIPS, MIPSEL, x86) and a supplementary material containing the code for generating the malware samples. Each subdirectory contains different files related to the architecture, including:
| Architecture | Behaviour | Total samples | Number of features |
|---|---|---|---|
|
|
PCAP | 737651 | 40 |
| SAR | 645518 | 461 | |
| STRACE | 645518 | 461 | |
MIPS |
PCAP | 870017 | 40 |
| SAR | 430540 | 392 | |
| STRACE | 430540 | 392 | |
MIPSEL |
PCAP | 1104016 | 40 |
| SAR | 516679 | 392 | |
| STRACE | 516679 | 392 | |
X86 |
PCAP | 455641 | 40 |
| SAR | 529212 | 409 | |
| STRACE | 529212 | 409 |
Note: The total number of samples presented here might be different from the paper because “Unknown” labels are contained in the dataset which can be dropped before applying ML algorithms.
Figure 1: Multi-tier sandbox architecture for IoT malware dynamic analysis

The following statistics represents the distribution of the stat of the SAR features using the MIPS data.
| features | mean | std | min | max | cv |
|---|---|---|---|---|---|
| interval | 1.224941 | 2.605152 | 0.00 | 117.00 | 2.126758 |
| cpu-load[0].usr | 18.819596 | 23.855737 | 0.00 | 100.00 | 1.267601 |
| cpu-load[0].nice | 0.000000 | 0.000000 | 0.00 | 0.00 | NaN |
| cpu-load[0].sys | 20.180509 | 24.555765 | 0.00 | 99.11 | 1.216806 |
| cpu-load[0].iowait | 0.454946 | 2.494017 | 0.00 | 95.92 | 5.482001 |
| cpu-load[0].steal | 0.000000 | 0.000000 | 0.00 | 0.00 | NaN |
| cpu-load[0].irq | 0.000000 | 0.000000 | 0.00 | 0.00 | NaN |
| cpu-load[0].soft | 0.543592 | 3.595822 | 0.00 | 97.14 | 6.614926 |
| cpu-load[0].guest | 0.000000 | 0.000000 | 0.00 | 0.00 | NaN |
| cpu-load[0].gnice | 0.000000 | 0.000000 | 0.00 | 0.00 | NaN |
| cpu-load[0].idle | 60.001124 | 44.701928 | 0.00 | 100.00 | 0.745018 |
| cpu-load[1].cpu | 0.000000 | 0.000000 | 0.00 | 0.00 | NaN |
| cpu-load[1].usr | 18.819596 | 23.855737 | 0.00 | 100.00 | 1.267601 |
| cpu-load[1].nice | 0.000000 | 0.000000 | 0.00 | 0.00 | NaN |
| cpu-load[1].sys | 20.180509 | 24.555765 | 0.00 | 99.11 | 1.216806 |
| cpu-load[1].iowait | 0.454946 | 2.494017 | 0.00 | 95.92 | 5.482001 |
| cpu-load[1].steal | 0.000000 | 0.000000 | 0.00 | 0.00 | NaN |
| cpu-load[1].irq | 0.000000 | 0.000000 | 0.00 | 0.00 | NaN |
| cpu-load[1].soft | 0.543592 | 3.595822 | 0.00 | 97.14 | 6.614926 |
| cpu-load[1].guest | 0.000000 | 0.000000 | 0.00 | 0.00 | NaN |
| cpu-load[1].gnice | 0.000000 | 0.000000 | 0.00 | 0.00 | NaN |
| cpu-load[1].idle | 60.001124 | 44.701928 | 0.00 | 100.00 | 0.745018 |
| process-and-context-switch.proc | 9.989325 | 17.479854 | 0.00 | 169.00 | 1.749853 |
| process-and-context-switch.cswch | 566.493796 | 1133.868486 | 0.28 | 11562.00 | 2.001555 |
| interrupts[0].all | 81.133433 | 59.262541 | 0.60 | 1886.00 | 0.730433 |
| interrupts[0].CPU0 | 81.133433 | 59.262541 | 0.60 | 1886.00 | 0.730433 |
| interrupts[1].intr | 0.000000 | 0.000000 | 0.00 | 0.00 | NaN |
| interrupts[1].all | 0.000000 | 0.000000 | 0.00 | 0.00 | NaN |
| interrupts[1].CPU0 | 0.000000 | 0.000000 | 0.00 | 0.00 | NaN |
| interrupts[2].intr | 2.000000 | 0.000000 | 2.00 | 2.00 | 0.000000 |
| interrupts[2].all | 0.000000 | 0.000000 | 0.00 | 0.00 | NaN |
| interrupts[2].CPU0 | 0.000000 | 0.000000 | 0.00 | 0.00 | NaN |
| interrupts[3].intr | 3.000000 | 0.000000 | 3.00 | 3.00 | 0.000000 |
| interrupts[3].all | 0.001813 | 0.126386 | 14.00 | 14.00 | 69.713754 |
| interrupts[3].CPU0 | 0.001813 | 0.126386 | 14.00 | 14.00 | 69.713754 |
| interrupts[4].intr | 4.000000 | 0.000000 | 4.00 | 4.00 | 0.000000 |
| interrupts[4].all | 0.875768 | 9.614894 | 0.00 | 1768.00 | 10.978813 |
| interrupts[4].CPU0 | 0.875768 | 9.614894 | 0.00 | 1768.00 | 10.978813 |
| interrupts[5].intr | 8.000000 | 0.000000 | 8.00 | 8.00 | 0.000000 |
| interrupts[5].all | 0.000000 | 0.000000 | 0.00 | 0.00 | NaN |
| interrupts[5].CPU0 | 0.000000 | 0.000000 | 0.00 | 0.00 | NaN |
| interrupts[6].intr | 10.000000 | 0.000000 | 10.00 | 10.00 | 0.000000 |
| interrupts[6].all | 16.209603 | 29.648234 | 0.00 | 905.00 | 1.829054 |
| interrupts[6].CPU0 | 16.209603 | 29.648234 | 0.00 | 905.00 | 1.829054 |
| interrupts[7].intr | 14.000000 | 0.000000 | 14.00 | 14.00 | 0.000000 |
| interrupts[7].all | 0.982883 | 3.487567 | 0.00 | 940.59 | 3.548303 |
| interrupts[7].CPU0 | 0.982883 | 3.487567 | 0.00 | 940.59 | 3.548303 |
| interrupts[8].intr | 15.000000 | 0.000000 | 15.00 | 15.00 | 0.000000 |
The authors would like to thank the Canadian Institute for Cybersecurity (CIC), for its financial and educational support.
Citation
S. Dadkhah, O. D. Okey, S. A. Maret, Y. Lo, A. Firouzia, R. Kuki, T. Sasaki, K. Yoshioka, T. Ban, S. Ozawa, A. A. Ghorbani, “CIC-YNU-IoTMal: A Comprehensive Multilayer Dataset for Static and Dynamic Analysis of IoT Malware Behavior," submitted to Expert Systems with Applications, 2026.
-
The dataset includes a Network taxonomy group (eval_test_network) where we intentionally injected high jitter (up to 500ms) into valid PQC sessions.
This is to train models to distinguish between "Slow Network" (Normal) and "DoS/Resource Exhaustion" (Anomaly).
The labels in ml_features.csv are designed for a two-stage pipeline.