IoT Dataset 2023 | Datasets | Research | Canadian Institute for Cybersecurity | UNB

Global Site Navigation (use tab and down arrow)

Canadian Institute for Cybersecurity

CIC IoT dataset 2023

A real-time dataset and benchmark for large-scale attacks in IoT environment

The main goal of this research is to propose a novel and extensive IoT attack dataset to foster the development of security analytics applications in real IoT operations. To accomplish this, 33 attacks are executed in an IoT topology composed of 105 devices.

These attacks are classified into seven categories, namely DDoS, DoS, Recon, Web-based, Brute Force, Spoofing, and Mirai. Finally, all attacks are executed by malicious IoT devices targeting other IoT devices.

The main contributions of this research are:

  • a new realistic IoT attack dataset, using an extensive topology compose of several real IoT devices and adopting IoT devices as attackers and victims;
  • We perform, document, and collect data from 33 attacks divided into 7 classes against IoT devices and demonstrated how they can be reproduced;
  • We evaluate the performance of machine and deep learning algorithms using the CICIoT2023 dataset to classify and detect IoT network traffic as malicious or benign.

CIC IoT lab

The production of IoT security data that can be used to support real applications is challenging for several reasons. One of the main problems is having an extensive network composed of several real IoT devices, similar to topologies of real IoT applications.

Many works adopt simulated or very few IoT devices due to costs, network equipment required (e.g., switches, routers, and network tap), and personnel dedicated to maintaining such an infrastructure.

Thereupon, the Canadian Institute for Cybersecurity (CIC) has a distinguished presence in the cybersecurity ecosystem and a history of high-impact contributions to industry and academia. Examples are datasets used to develop new cybersecurity applications and several partnerships with industry to improve the cybersecurity practice and develop new solutions.

This success enabled CIC to establish an IoT lab with a dedicated network to enable the development of IoT security solutions. In fact, by sharing the data collected from this extensive topology, we intend to foster the advancement of IoT security research and support several initiatives in different IoT security aspects.

Topology chart    Topology diagram

Data descriptions

  • ACK fragmentation
  • UDP flood
  • SlowLoris
  • ICMP flood
  • RSTFIN flood
  • PSHACK flood
  • HTTP flood
  • UDP fragmentation
  • TCP flood
  • SYN flood
  • SynonymousIP flood

  • Dictionary brute force

  • Arp spoofing
  • DNS spoofing

  • TCP flood
  • HTTP flood
  • SYN flood
  • UDP flood

  • Ping sweep
  • OS scan
  • Vulnerability scan
  • Port scan
  • Host discovery

  • Sql injection
  • Command injection
  • Backdoor malware
  • Uploading attack
  • XSS
  • Browser hijacking

  • GREIP flood
  • Greeth flood
  • UDPPlain

Feature mean std min 25% 50% 75% max
flow_duration 5.76544939 285.034171 0 0 0 0.10513809 394357.207
Header_Length 76705.9637 461331.747 0 54 54 280.555 9907147.75
Protocol type 9.06568989 8.94553292 0 6 6 14.33 47
Duration 66.3507169 14.0191881 0 64 64 64 255
Rate 9064.05724 99562.4906 0 2.09185589 15.7542308 117.384754 8388608
Srate 9064.05724 99562.4906 0 2.09185589 15.7542308 117.384754 8388608
Drate 5.46E-06 0.00725077 0 0 0 0 29.7152249
fin_flag_number 0.08657207 0.28120696 0 0 0 0 1
syn_flag_number 0.20733528 0.40539779 0 0 0 0 1
rst_flag_number 0.09050473 0.28690351 0 0 0 0 1
psh_flag_number 0.08775006 0.28293106 0 0 0 0 1
ack_flag_number 0.12343168 0.32893207 0 0 0 0 1
ece_flag_number 1.48E-06 0.00121571 0 0 0 0 1
cwr_flag_number 7.28E-07 0.00085338 0 0 0 0 1
ack_count 0.09054283 0.28643144 0 0 0 0 7.7
syn_count 0.33035785 0.6635354 0 0 0 0.06 12.87
fin_count 0.09907672 0.32711642 0 0 0 0 248.32
urg_count 6.23982356 71.8524536 0 0 0 0 4401.7
rst_count 38.4681213 325.384658 0 0 0 0.01 9613
HTTP 0.04823423 0.21426079 0 0 0 0 1
HTTPS 0.05509922 0.22817383 0 0 0 0 1
DNS 0.00013068 0.01143079 0 0 0 0 1
Telnet 2.14E-08 0.00014635 0 0 0 0 1
SMTP 6.43E-08 0.00025349 0 0 0 0 1
SSH 4.09E-05 0.00639772 0 0 0 0 1
IRC 1.50E-07 0.00038722 0 0 0 0 1
TCP 0.57383427 0.49451846 0 0 1 1 1
UDP 0.21191758 0.40866676 0 0 0 0 1
DHCP 1.71E-06 0.00130903 0 0 0 0 1
ARP 6.62E-05 0.00813521 0 0 0 0 1
ICMP 0.16372157 0.37002273 0 0 0 0 1
IPv 0.99988731 0.01061485 0 1 1 1 1
LLC 0.99988731 0.01061485 0 1 1 1 1
Tot sum 1308.32257 2613.30273 42 525 567 567.54 127335.8
Min 91.6073456 139.695326 42 50 54 54 13583
Max 181.963418 524.030902 42 50 54 55.26 49014
AVG 124.668815 240.991485 42 50 54 54.0497296 13583
Std 33.3248065 160.335722 0 0 0 0.37190955 12385.2391
Tot size 124.691567 241.549341 42 50 54 54.06 13583
IAT 83182525.9 17047351.7 0 83071566 83124522.4 83343908 167639436
Number 9.49848933 0.81915318 1 9.5 9.5 9.5 15
Magnitue 13.12182 8.62857895 9.16515139 10 10.3923048 10.3967148 164.821115
Radius 47.0949848 226.769647 0 0 0 0.50592128 17551.2708
Covariance 30724.3565 323710.68 0 0 0 1.34421569 154902159
Variance 0.0964376 0.233001 0 0 0 0.08 1
Weight 141.51237 21.0683073 1 141.55 141.55 141.55 244.6

Dataset directories

The main dataset directory (CICIoT2023) contains four subdirectories related to different files, namely:

  1. PCAP: Contains the original traffic captured during the attacks as .pcp files;
  1. CSV: Contains features extracted from the original files to be used in the Machine Learning (ML) evaluation (.csv files);
  1. Example: A jupyter notebook that shows how the dataset can be used to train and evaluate Machine Learning (ML) models in attack detection and classification;
  1. Supplementary material: Source code and description of tools used throughout the process of collecting and wrangling the attack data. We used Mergecap to merge multiple .pcap files, PySpark to handle the data, TCPDump to split the .pcap files in multiple smaller files, and DPKT to extract features.

Acknowledgments

The authors would like to thank the Canadian Institute for Cybersecurity (CIC) for its financial and educational support.

Citation

E. C. P. Neto, S. Dadkhah, R. Ferreira, A. Zohourian, R. Lu, A. A. Ghorbani. "CICIoT2023: A real-time dataset and benchmark for large-scale attacks in IoT environment," Sensor (2023) – (submitted to Journal of Sensors).

Download the dataset