CICEV2023 DDoS attack datasets

DDos attack dataset against EV authentication in charging infrastructure

Most of the latest studies on detection models for DoS or DDoS have been applied in general networks. Therefore, no dataset of DoS or DDoS in electric vehicle (EV) charging infrastructure exists. In addition, existing datasets have information on the reception count of packets during a specific period. However, our dataset provides more diverse machine learning features, including packet access counts and system status information on charging facilities. The dataset in this work can contribute to EV charging system analyses and provide training and testing features for a DoS or DDoS attack detection classifier. To create this dataset, we developed a simulator to simulate multiple EVs, charging stations (CSs) and a GS of charging infrastructure network and implemented four attack scenarios.

CICEV2023 dataset description

The dataset consists of four attack scenarios based on (i) Correct EV ID, (ii) Wrong EV, (iii) Wrong EV Timestamp and (iv) Wrong CS Timestamp. We created an EV authentication protocol on the simulator to get the dataset. We applied a session key, Hash-based Message Authentication Code (HMAC), time stamp, and AND, OR and XOR operations to the protocol.

The first step consisted of data collection. A simulator was built in Python in the Ubuntu environment to collect the data. We used ACN-Network for that purpose and accessed charging time from Sept. 5, 2019 to Sept. 6, 2020. We provide features to reflect different attack scenarios. We identified the following features of the CS and GS on the simulator: (i) data indicating the Linux kernel overhead based on CPU cycles, branch instructions and general instructions, (ii) data representing system performance status based on the number of consumed CPU cycles, branch instruction and general instructions, (iii) the time differences in legitimate authentication trials or DDoS attacks. The data in (i) and (ii) were collected in real-time using Perf.

This dataset consists of profiling results such as the performance overhead and system resource consumption in the profiling target. Another type of feature represents time differences for authentication times of EVs in each CS.

Normal EV charging scenario

Figure 1: Proposed simulator structure

To create a normal scenario, we use the monthly charging count for one year (2019.9.5 ~ 2020.9.6) in the ACN network. Based on this information, a CS and a normal EV charging scenario were constructed. Each CS is created as a separate multi-process. Perf does profiling by referring to the PID of this process. EVs are created on multi-threads and Fig. 1 shows the structure of the simulator used in this study.

Time scaling

The normal EV charging scenario was constructed based on a single year of the ACN Network. In simulations, it is impractical to collect data over long periods. Therefore, the long-term normal scenario time is converted based on the time spent in the attack scenario. In this way, a normal charging schedule for a year can be simulated simultaneously as the standard of an attack scenario within tens of minutes. In a normal charging scenario, each EV may show a charging time interval of several hours or days in a specific CS, but these time intervals are time scaled through the equation below. The scaling formula is:

Calculate the normal scenario-specific period (1 year) in seconds (∀𝑛𝑡).

Simulate the attack scenario and convert the duration to seconds (∀𝑎𝑡)

𝜀𝑡=∀𝑎𝑡∀𝑛𝑡 is obtained and the time (in seconds) interval of EV authentications in the normal Scenario is multiplied by 𝜀𝑡 to sleep (𝑆𝑡=𝑛𝑡×𝜀𝑡).

DDoS attack scenarios

The Scenario based on DDoS attacks with false authentication and timestamp manipulation consists of four parts.

The correct ID of the EV: The attacker tries to authenticate himself by obtaining a normal ID but does not have the correct key.
The wrong ID of the EV: The attacker attempts authentication without the correct ID and the legitimate key.
The wrong timestamp of the EV: The attacker causes authentication failure in CS by changing the timestamp value between EV and CS to an old value.
The wrong timestamp of the CS: The attacker changes the timestamp value between CS and GS to an old value, causing authentication failure at GS.

Attack magnitudes

When the attacks occur, based on the normal Scenario, multiple DDoS attacks and normal EVs compete with each other to be authenticated by the GS through CS. At this time, EVs attempt normal authentication with the correct session key and timestamp and attacking EVs execute the four attack scenarios above. When attacking for each Scenario, the strength of the attacks proceeds differently. DDoS attacks are of four types:

Full Attack & Non-Gaussian Analysis Attack
Full Attack & Gaussian Analysis Attack
Random Attack & Non-Gaussian Analysis Attack
Random Attack & Gaussian Analysis Attack

The full attack mode attacks many identical EVs to all CSs simultaneously. In this dataset, 2,000 attack EVs are executed for each CS. The random attack mode arbitrarily chooses the victim CS under the attacks. The Gaussian analysis attack mode assumes a smart DDoS attack. This attack uses Gaussian analysis to create a distribution similar to the normal EV authentication distribution. This makes it difficult for a detection model to differentiate between attack and normal authentication. Although the impact of the attacks is weaker than the full attack, it can cause service delays without being easily detected through statistical analysis.

Suppose the level of DoS attacks is adjusted by analyzing the Gaussian distribution for the number of authentications in a normal scenario; in that case, an attack detection model's false positive and false positive rates can be increased. In general, the entropy of the data distributions of the normal and attack scenarios can be maximized by following the natural distribution of the number of authentication requests according to the Gaussian distribution (ϕ). The difference between the attack and the normal authentication attempt can be ambiguous. Therefore, the difficulty of classification inevitably increases.

Ρ_i=𝜙(𝜇,𝜎), μ: average number of EV certifications, σ: standard deviation of the number of EV recharges
Ρ_i represents an individual probability for the number of EV authentications (ω) in each CS
β_i=(α×P_i)) / ∑_ⁿ_iP_i represents the number of DDoS authentication attacks for the number of normal authentications in each i-th CS and applying a round function results in 0≤β_i<n having a range. 𝛼 is a coefficient.

Feature description

These features were measured with Perf when normal EV authentications or DDoS attacks occurred. The status of CS and GS is measured in real-time through Perf.

Feature Name	Description
Time delta	It is the interval between the immediately preceding and subsequent authentications.
Instruction overhead	It provides overhead information about libraries and code symbols used in the Linux kernel by counting the number of instructions used by the profiling targets (CS or GS).
CPU cycle overhead	Perf calculates cycles used by profiling targets and provides overhead information about libraries and code symbols used in the Linux kernel.
Branch overhead	It provides overhead information about libraries and code symbols used in the Linux kernel by counting the number of branch instructions used by the profiling target.
Cycles	It means the number of cycles consumed by the profiling target.
Instructions	It refers to the number of instructions the profiling target uses.
Branches	It refers to the number of branch commands executed by the profiling target.

"Time delta" represents the time differences between the previous EV authentication and the next. This feature can facilitate cosine similarity analysis to discern DoS or DDoS attacks. "Instruction overhead" is a feature obtained by measuring the overhead of each symbol based on the number of instructions implemented in the profiling target CS or GS.

Likewise, "CPU cycle overhead" and "Branch overhead" are features regarding the overhead of symbols. The former concerns the number of consumed cycles in each symbol and the latter concerns the number of consumed branch instructions. "Cycles," "Instruction" and "Branch" represent the total number of systemwide consumed cycles, instructions and branch instructions in CS or GS.

Directory and file naming structures of the dataset

The explanations on our dataset directory and file naming structures are as follows:

“Processed_Data”: this directory contains preprocessed data for our research on the attack detection model regarding DDoS attacks on the EV-CS-GS environment.
“Raw_Data”: this directory is the root directory of the dataset.
“Correct_ID”: the correct id attack scenario data belongs in this directory.
“Wrong_CS_TS”: the wrong timestamp data on the charging station belongs in this directory.
“Wrong_EV_TS”: the wrong timestamp data on the EVs belongs in this directory.
“Wrong_ID”: the wrong id data of the EVs belongs in this directory.
“Random_CS_Off”: the data without the random attack targeting strategy belongs in this directory.
“Random_CS_On”: the data with the random attack targeting strategy belongs in this directory.
“Gaussian_Off”: the data without the Gaussian attack strategy belongs in this directory.
“Guassian_On” the data with the Gaussian attack strategy belongs in this directory.
“Attack”: the attack data belong in this directory.
“Normal”: the normal data belongs in this directory.
“cs/gs_record”: the Perf record data of CS or GS belongs in this directory.
“cs/gs_stat”: the Perf STAT data of CS or GS belongs in this directory.
“cs/gs_top”: the Perf TOP data of CS or GS belongs in this directory.
“acn_data.csv”: this file contains real EV charging schedules from the ACN-Network.
“attack_config.csv”: this file contains the information on the attack scenario, normal authentication trials and attack trials.
“attack/normal_mode.txt”: this file contains the information on simulation environment settings.
“attack/normal_time_diff.txt”: this file contains the CS ID list and data points of the intervals on DDoS attacks or normal EV authentication trials.
“authentication_results.csv”: this file contains the results of the normal EV authentications and DDoS attacks.
“cs_id_pid.csv”: this file contains the CS IDs matched with specific process IDs in the Linux kernel.
“cs_installation.csv”: this file contains the CS list installed legitimately.
“date.csv”: this file contains the start date and end date of the simulation.
“ev_authentication.csv”: this file is similar to “authentication_results.csv.”
“ev_count.txt”: this file shows how many EV authentication and DDoS attack trials are made through different CS. CS ID, attack count and normal authentication count are in order.
“ev_installation”: this file shows normal authentication or attack sequences, session ID, CS ID, session key information and the result of successful CS installation.
“gaussian_attack_count.txt”: this file is used for the paper of this work.
“mean_std.txt”: this file is used for the paper of this work.
“sim_sec_attack/normal.txt”: this file shows the time scaled in normal EV authentications or DDoS attacks in the simulation and simulation period.

Acknowledgment

The authors graciously acknowledge the support from the Canadian Institute for Cybersecurity (CIC), the funding support from the Canada Research Chair and Atlantic Canada Opportunities Agency (ACOA).

Citation

Kim, Y., Hakak, S., & Ghorbani, A. (2023, August). DDoS Attack Dataset (CICEV2023) against EV Authentication in Charging Infrastructure. In 2023 20th Annual International Conference on Privacy, Security and Trust (PST) (pp. 1-9). IEEE Computer Society.

Download the dataset

Revised version