Global Site Navigation (use tab and down arrow)

Canadian Institute for Cybersecurity

Investigation of the Android Malware (CICInvesAndMal2019) 

We provide the second part of the CICAndMal2017 dataset publicly available namely CICInvesAndMal2019 which includes permissions and intents as static features and API calls and all generated log files as dynamic features in three steps (During installation, before restarting and after restarting the phone). In this part, we improve our malware category and family classification performance around 30% by combining the previous dynamic features (80 network-flows by using CICFlowMeter-V3) with 2-gram sequential relations of API calls. In addition, we examine these features in the presented two-layer malware analysis framework. Besides these, we provide other captured features such as battery states, log states, packages, process logs, etc.

In this second part of the dataset, we followed the same installation and capturing process as previous:

We installed 5,000 of the collected samples (426 malware and 5,065 benign) on real devices. Our malware samples in this dataset are classified into four categories:

  • Adware
  • Ransomware
  • Scareware
  • SMS Malware

Our samples come from 42 unique malware families. The family kinds of each category and the numbers of the captured samples are as follows:

Adware

  • Dowgin family, 10 captured samples
  • Ewind family, 10 captured samples
  • Feiwo family, 15 captured samples
  • Gooligan family, 14 captured samples
  • Kemoge family, 11 captured samples
  • koodous family, 10 captured samples
  • Mobidash family, 10 captured samples
  • Selfmite family, 4 captured samples
  • Shuanet family, 10 captured samples
  • Youmi family, 10 captured samples

Ransomware

  • Charger family, 10 captured samples
  • Jisut family, 10 captured samples
  • Koler family, 10 captured samples
  • LockerPin family, 10 captured samples
  • Simplocker family, 10 captured samples
  • Pletor family, 10 captured samples
  • PornDroid family, 10 captured samples
  • RansomBO family, 10 captured samples
  • Svpeng family, 11 captured samples
  • WannaLocker family, 10 captured samples

Scareware

  • AndroidDefender 17 captured samples
  • AndroidSpy.277 family, 6 captured samples
  • AV for Android family, 10 captured samples
  • AVpass family, 10 captured samples
  • FakeApp family, 10 captured samples
  • FakeApp.AL family, 11 captured samples
  • FakeAV family, 10 captured samples
  • FakeJobOffer family, 9 captured samples
  • FakeTaoBao family, 9 captured samples
  • Penetho family, 10 captured samples
  • VirusShield family, 10 captured samples

SMS Malware

  • BeanBot family, 9 captured samples
  • Biige family, 11 captured samples
  • FakeInst family, 10 captured samples
  • FakeMart family, 10 captured samples
  • FakeNotify family, 10 captured samples
  • Jifake family, 10 captured samples
  • Mazarbot family, 9 captured samples
  • Nandrobox family, 11 captured samples
  • Plankton family, 10 captured samples
  • SMSsniffer family, 9 captured samples
  • Zsone family, 10 captured samples

In order to acquire a comprehensive view of our malware samples, we created a specific scenario for each malware category. We also defined three states of data capturing in order to overcome the stealthiness of advanced malware:

  1. Installation: The first state of data capturing which occurs immediately after installing malware (1-3 min). (In the dataset the folder name is "AfterInstall")
  2. Before restart: The second state of data capturing which occurs 15 min before rebooting phones. (In the dataset the folder name is "Before")
  3. After restart: The last state of data capturing which occurs 15 min after rebooting phones. (In the dataset the folder name is "After")

See our publicly available Android Sandbox for more capturing detail.

License

The CICInvesAndMal2019 dataset is publicly available for researchers. If you are using our dataset, you should cite our related research paper which outlines the details of the dataset and its underlying principles:

  • Laya Taheri, Andi Fitriah Abdulkadir, Arash Habibi Lashkari; Extensible Android Malware Detection and Family Classification Using Network-Flows and API-Calls, The IEEE (53rd) International Carnahan Conference on Security Technology, India, 2019

Download this dataset