Enriched Dataset | Datasets | Canadian Institute for Cybersecurity | UNB

Global Site Navigation (use tab and down arrow)

Canadian Institute for Cybersecurity

Enriching IoT datasets

Enriching the existing famous IoT datasets (Bot-IoT and TON-IoT) by employing two general aspects, namely Horizontal and Vertical. Horizontal means proposing new and informative features for datasets. Vertical aspect presents the idea of merging datasets.


The authors would like to thank Canadian Institute for Cybersecurity (CIC) for its financial and educational support. This project was supported in part by collaborative research funding from the National Research Council of Canada’s Artificial Intelligence for Logistics Program.


The main directory contains two zip files, namely Datasets and Source_codes.

The Dataset Zip file contains two folders, namely merged_datasets and Original_datasets. The merged dataset folder includes all H enriched datasets for Bot_IoT and Ton_IoT different attacks. The Original_datasets folder presents the original datasets that are going to be used for V enriched part.


The name of datasets are mapped to the following names in our paper:

H+V Enriched DS = Bot_iot_DDoS_new + Bot_iot_DoS_new + Bot_iot+scanning + Ton_iot_DDoS_new + Ton_iot_DoS + Ton_iot_scanning_new

V Enriched DS = All the datasets in the Original dataset' folder are used.

H Enriched Bot IoT = Bot_iot_DDoS_new + Bot_iot_DoS_new + Bot_iot+scanning.

H Enriched Ton IoT = Ton_iot_DDoS_new + Ton_iot_DoS + Ton_iot_scanning_new.

Source codes

The Source code folder contains our implementation for extracting original and proposed features from PCAP files.

There are seven .py files inside this folder. To execute the source code, you need to navigate to the Generating_dataset.py file. Inside this file, the addresses of PCAP files can be set. The remaining python files present our implementation for different parts. For example, as the names suggest, the Communication_features.py implements the Communication features proposed by us. The Supporting_functions.py demonstrates the supporting functions that are defined to make the extraction process easy. Similarly, we can analyze the remaining files.


To execute the source code, you need to navigate to the Generating_dataset.py file. Inside this file, the addresses of PCAP files can be set. The Feature_extraction.py file includes the main process for analyzing the PCAP files.

To implement more features, simply define their specific python file and utilize them in the Feature_extraction.py file.


The project is not currently in development but any contribution is welcome. Please contact one of the authors of the paper.


For citation in your works, please cite the following paper " yet to be accepted in the conference"

Masoud Erfani, Farzaneh Shoeleh, Sajjad Dadkhah, Barjinder Kaur, Pulei Xiong, Shahrear Iqbal, Suprio Ray, and Ali A. Ghorban, “A feature exploration approach for IoT attack type classification”, 2021 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress, pp. 582-588, 2021. 

Download the dataset