CIC IoT dataset 2022

This project aims to generate a state-of-the-art dataset for profiling, behavioural analysis, and vulnerability testing of different IoT devices with different protocols such as IEEE 802.11, Zigbee-based and Z-Wave. The following illustrates the main objectives of the CIC-IoT dataset project:

Configure various IoT devices and analyze the behaviour exhibited.
Conduct manual and semi-automated experiments of various categories.
Further analyze the network traffic when the devices are idle for three minutes and when powered on for the first two minutes.
Generating different scenarios and analyzing the devices' behaviour in different situations.
Conducting and capturing the network terrific of devices undercurrent and important attacks in IoT environment.

Current CIC IoT dataset project and activities around it can be summarized in the following steps:

Network configuration

Our lab network configuration was configured with a 64-bit Window machine with two network interface cards - one is connected to the network gateway, and the other is connected to an unmanaged network switch. Simultaneously, Wireshark, the open-source network protocol analyzer, listens to both interfaces, captures and saves the output packet captured (pcap) files. Hence, IoT devices that require an Ethernet connection are connected to this switch. Additionally, a smart automation hub, Vera Plus is also connected to the unmanaged switch, which creates our wireless IoT environment to serve IoT devices compatible with Wi-Fi, ZigBee, Z-Wave and Bluetooth.

Dataset

For collecting the data, we captured the network traffic of the IoT devices coming through the gateway using Wireshark and dumpcap in six different types of experiments. The former was used for manual experiments, while the latter was used for semi-automated ones. All the experiments can be organized as follows:

Power: In this experiment, we powered on all the devices in our lab individually and started a network traffic capture in isolation.

Idle: In this experiment, we captured the whole network traffic from late in the evening to early in the morning, which we call idle time. In this period, the whole lab was completely evacuated and there were no human interactions involved.

Interactions: In this experiment, all possible functionality on IoT devices has been extracted and the corresponding network activity and transmitted packets for each functionality/activity have been captured.

Scenarios: In these experiments, we conducted six different types of scenario experiments using a combination of devices as simulations of the network activity inside a smart home. These experiments were done to see how devices behave while interacting with each other simultaneously.

Active: In addition to the idle time, the whole network communications were also captured throughout the day. All fellow researchers during this period were allowed to enter the lab whenever they wanted. They might interact with devices and generate network traffic either passively or actively.

Attacks: In this experiment, we performed two different attacks, Flood and RTSP- Brute Force, on some of our devices and captured their attack network traffic.

Case study – device identification

After generating the dataset, we performed a case study on the idea of transferability – training datasets in our lab and transferring the trained model to another lab for testing. We conducted 20 different experiments based on the number of sampled devices from the United States lab.

Forty-eight features were extracted from both the training dataset from our lab and the testing dataset from the other lab. Three classes of device types were used in this experiment: Audio, Camera and Home Automation. However, no labels were required for the test dataset since that was what was to be predicted but the training dataset required labels.

After training, the model is transferred to the other lab for testing on each device to predict the class of the device in question. For example, if Amazon Echo Dot is tested on the trained model, the classifier should be able to predict this device as belonging to device type Audio. How this works is by counting the prediction of the classifier based on the features for each device type. The device type with the highest count is predicted as the class for the device in question.

Dataset directory

The main dataset directory (CIC IoT Dataset) contains six subdirectories related to each experiment, namely:

Power: In this directory, you will find the power experiment packet captures for each device, categorized by different device classes.

Idle: In this directory, you will find idle experiment packet captures for 30 days, named and sorted by date.

Interactions: In this directory, you will find the interactions experiments packet captures for each device, categorized by different device classes. Each interaction includes three packet captures.

Scenarios: In this directory, you will find six sub-directories, each of which is related to one scenario. Each scenario includes three packet captures.

Active: In this directory, you will find active experiment packet captures for 30 days, named and sorted by date.

Attacks: In this directory, you will find two sub-directories, Flood and RTSP BruteForce, each for a specific attack performed on a few devices. The latter was performed using two different tools, Hydra and Nmap. Each attack includes three packet captures per device.

Contributing

The project is not currently in development, but any contribution is welcome. Please contact one of the authors of the paper.

Using the dataset

YouTube video: Label Flipping Mitigation in Deep-Learning-Based IoT Profiling by Dr. Euclides Carlos Pinto Neto.

Webinar explanation about CIC IoT datasets: "From Profiling to Protection: Leveraging Datasets for Enhanced IoT Security" by Dr. Sajjad Dadkhah, Assistant Professor and Cybersecurity R&D Team Lead with Q&A by Sumit Kundu.

YouTube video: IoTProMo: Securing IoT Networks using Device Profiling and Monitoring by Alireza Zohourian with Q&A by Sumit Kundu.

Acknowledgments

The authors would like to thank the Canadian Institute for Cybersecurity for its financial and educational support.

Citation

Sajjad Dadkhah, Hassan Mahdikhani, Priscilla Kyei Danso, Alireza Zohourian, Kevin Anh Truong, Ali A. Ghorbani, “Towards the development of a realistic multidimensional IoT profiling dataset”, Submitted to: The 19th Annual International Conference on Privacy, Security & Trust (PST2022) August 22-24, 2022, Fredericton, Canada.

Download the dataset

Global Site Navigation (use tab and down arrow)