Global Site Navigation (use tab and down arrow)

Canadian Institute for Cybersecurity

Android Malware Dataset (CICAndMal2017)

We propose our new Android malware dataset here, named CICAndMal2017. In this approach, we run our both malware and benign applications on real smartphones to avoid runtime behavior modification of advanced malware samples that are able to detect the emulator environment. We collected more than 10,854 samples (4,354 malware and 6,500 benign) from several sources. We have collected over six thousand benign apps from Googleplay market published in 2015, 2016, 2017.

We installed 5,000 of the collected samples (426 malware and 5,065 benign) on real devices. Our malware samples in the CICAndMal2017 dataset are classified into four categories:

  • Adware
  • Ransomware
  • Scareware
  • SMS Malware

Our samples come from 42 unique malware families. The family kinds of each category and the numbers of the captured samples are as follows:

Adware

  • Dowgin family, 10 captured samples
  • Ewind family, 10 captured samples
  • Feiwo family, 15 captured samples
  • Gooligan family, 14 captured samples
  • Kemoge family, 11 captured samples
  • koodous family, 10 captured samples
  • Mobidash family, 10 captured samples
  • Selfmite family, 4 captured samples
  • Shuanet family, 10 captured samples
  • Youmi family, 10 captured samples

Ransomware

  • Charger family, 10 captured samples
  • Jisut family, 10 captured samples
  • Koler family, 10 captured samples
  • LockerPin family, 10 captured samples
  • Simplocker family, 10 captured samples
  • Pletor family, 10 captured samples
  • PornDroid family, 10 captured samples
  • RansomBO family, 10 captured samples
  • Svpeng family, 11 captured samples
  • WannaLocker family, 10 captured samples

Scareware

  • AndroidDefender 17 captured samples
  • AndroidSpy.277 family, 6 captured samples
  • AV for Android family, 10 captured samples
  • AVpass family, 10 captured samples
  • FakeApp family, 10 captured samples
  • FakeApp.AL family, 11 captured samples
  • FakeAV family, 10 captured samples
  • FakeJobOffer family, 9 captured samples
  • FakeTaoBao family, 9 captured samples
  • Penetho family, 10 captured samples
  • VirusShield family, 10 captured samples

SMS Malware

  • BeanBot family, 9 captured samples
  • Biige family, 11 captured samples
  • FakeInst family, 10 captured samples
  • FakeMart family, 10 captured samples
  • FakeNotify family, 10 captured samples
  • Jifake family, 10 captured samples
  • Mazarbot family, 9 captured samples
  • Nandrobox family, 11 captured samples
  • Plankton family, 10 captured samples
  • SMSsniffer family, 9 captured samples
  • Zsone family, 10 captured samples

In order to acquire a comprehensive view of our malware samples, we created a specific scenario for each malware category. We also defined three states of data capturing in order to overcome the stealthiness of an advanced malware:

  1. Installation: The first state of data capturing which occurs immediately after installing malware (1-3 min).
  2. Before restart: The second state of data capturing which occurs 15 min before rebooting phones.
  3. After restart: The last state of data capturing which occurs 15 min after rebooting phones.

For feature Extraction and Selection, we captured network traffic features (.pcap files), and extracted more than 80 features by using CICFlowMeter-V3 during all three mentioned states (installation, before restart, and after restart). See our publicly available Android Sandbox.

The full research paper outlining the details of the dataset and its underlying principles:

Arash Habibi Lashkari, Andi Fitriah A.Kadir, Laya Taheri, and Ali A. Ghorbani, “Toward Developing a Systematic Approach to Generate Benchmark Android Malware Datasets and Classification”, In the proceedings of the 52nd IEEE International Carnahan Conference on Security Technology (ICCST), Montreal, Quebec, Canada, 2018.

For more information, contact cic@unb.ca.