Modbus 2023 | Datasets | Research | Canadian Institute for Cybersecurity | UNB

Global Site Navigation (use tab and down arrow)

Canadian Institute for Cybersecurity

CIC Modbus dataset 2023 (CICModbusDataset2023)

The CIC Modbus Dataset contains network (pcap) captures and attack logs from a simulated substation network. The dataset is categorized into two groups: an attack dataset and a benign dataset.

The attack dataset includes network traffic captures that simulate various types of Modbus protocol attacks in a substation environment. The attacks are reconnaissance, query flooding, loading payloads, delay response, modify length parameters, false data injection, stacking Modbus frames, brute force write and baseline replay. These attacks are based of some techniques in the MITRE ICS ATT&CK framework.

On the other hand, the benign dataset consists of normal network traffic captures representing legitimate Modbus communication within the substation network.

The purpose of this dataset is to facilitate research, analysis, and development of intrusion detection systems, anomaly detection algorithms and other security mechanisms for substation networks using the Modbus protocol.

Architecture

The CIC Modbus Dataset was generated from Wireshark captures obtained from a simulated testbed. As the dataset is based on a simulated Docker environment, the Docker containers were created to represent IEDs and SCADA HMIs. Python scripts were generated to run the logic of IEDs and SCADA HMIs.

The logic for an IED is to periodically change the voltage values randomly or when a request is received from SCADA HMI to do so. The logic of the SCADA HMI is to tap-change based on values received from IED and close or open based on overvoltage or undervoltage.

The containers were built to contain either the detection code (Java jar files) and scripts, or only the scripts. IEDs or SCADA HMIs that contain only the scripts are the insecure devices. The secure IEDs or SCADA HMIs contain both the jar files and scripts. Each secure device contains an agent that sends detection scores to a central agent.

Data collection

The CIC Modbus Dataset was collected using the following methods:

  • Network interface card (NIC) capture: The network traffic of each Intelligent Electronic Device (IED) within the substation network was captured using tcpdump. This allowed for the collection of specific traffic related to individual devices.
  • Docker bridge capture: The network traffic of the entire substation network was captured by monitoring the Docker bridge. This provided a comprehensive view of the network, including communication between different devices.
  • Attack scenarios: All attack datasets can be located in the attacks folder. Within the attack folder, there are folders that contain datasets covering attacks conducted in three different scenarios: attacks from devices external to the network (external folder), attacks from compromised IED (compromised-ied folder) and attacks from compromised HMI (compromised-scada). Attack logs in the external folder can be found within the external-attacker folder. Attack logs are available in the attack logs folder within the compromised-ied folder (in this scenario, the attacking node is IED1B). Attack logs are available in the `attack logs` folder within the compromised-scada folder (in this scenario, the attacking node is the normal SCADA HMI).

Data format

The CIC Modbus Dataset is provided in the following formats:

  • Network captures: The network captures are stored in PCAP (Packet Capture) format. The captures are chunked into 100MB files, named in sequential order and each file represents a portion of the overall network traffic.
  • Logs: The logs generated by the attack tools and the trust model are stored in CSV (Comma-Separated Values) format. The logs are grouped by dates, and each record within the log files is timestamped, providing a chronological view of the captured events.

Data dictionary

The CIC Modbus Dataset includes several fields or attributes across the different files. Here is a breakdown of the fields, their data types, possible values or categories and explanations.

PCAP files (network captures)

  • Source IP address: The source IP address of the network packet. (String)
  • Destination IP address: The destination IP address of the network packet. (String)
  • Other IP-related fields: Depending on the specific PCAP file, additional IP-related fields may be present, such as protocol, port numbers, etc.

The IPs of the devices are shown below:

  • Secure IEDs
    • IED1A – 185.175.0.4
    • IED4C – 185.175.0.8
  • Normal IEDs
    • IED1B – 185.175.0.5
  • Secure SCADA HMI – 185.175.0.2
  • Normal SCADA HMI – 185.175.0.3
  • Central Agent – 185.175.0.6
  • Attacker – 185.175.0.7
The timestamps in the PCAP files are recorded in ADT (Atlantic Daylight Time), which at the time of capture was UTC minus 3 hours. To align these with the log files:
  • Adjust the time column in the PCAP files to UTC.
  • Alternatively, use a method or tool that can account for the time zone difference when analyzing the data.
Note: If you do not observe the expected attack packets (identified in the logs) after these adjustments, consider the possibility of bugs or the capture being stopped prematurely during the attack.

Logs (CSV files)

  • csv
    • Timestamp: The timestamp of the attack event. (Date or time)
    • TargetIP: The IP address of the targeted device. (String)
    • Attack: The type of attack. (String)
    • TransactionID: The ID of the transaction associated with the attack. (String)

The timestamps in all log files are recorded in UTC time and include milliseconds.
Important: To view the full precision (milliseconds), do not open these logs in Microsoft Excel because Excel tends to round timestamps to seconds. Instead, open them with Notepad or LibreOffice Calc. Avoid a workflow where the file is first opened and re-saved in Excel before being viewed in Notepad, as this can alter the timestamp format.

Dataset usage

The CIC Modbus Dataset provides valuable resources for various research and practical applications, including:

  • Research on trust in securing substations: Researchers can utilize the pcap files to analyze trust-related aspects in securing substations. This includes evaluating trust models, assessing the effectiveness of security mechanisms and investigating trust-based intrusion detection systems.
  • Machine learning techniques: The pcap files can serve as a valuable training and evaluation resource for machine learning models. Researchers can develop and apply ML techniques, such as anomaly detection, classification or clustering, to enhance the security of substation networks.

To facilitate accurate labeling and analysis, it is recommended to extract IP-specific versions of the pcap files for research purposes. This allows for precise identification and classification of network traffic associated with specific IP addresses.

Webinar example of dataset use:  "Securing Substations with Trust, Risk Posture, and Multi-Agent Systems: A Comprehensive Approach" by Dr. Kwasi Boakye-Boateng, Postdoctoral Fellow, Canadian Institute for Cybersecurity and Q&A with Sumit Kundu.

Acknowledgments

The creators of the CIC Modbus Dataset would like to acknowledge the following organizations for their contributions and support:

Contact information

For any inquiries, feedback, or collaboration opportunities related to the CIC Modbus Dataset, please contact:

License

You may redistribute, republish and mirror the CIC Modbus Dataset 2023 dataset in any form. However, any use or redistribution of data must include a citation to the CIC Modbus Dataset 2023 dataset and the following paper.

Kwasi Boakye-Boateng, Ali A. Ghorbani, and Arash Habibi Lashkari, "Securing Substations with Trust, Risk Posture and Multi-Agent Systems: A Comprehensive Approach," 20th International Conference on Privacy, Security and Trust (PST), Copenhagen, Denmark, August. 2023.

Download the dataset