Go to file
the-themis-benchmarks c0d7c41198 revise 2021-06-03 01:04:54 -07:00
APhotoManager Initial commit 2021-04-05 18:45:52 +08:00
ActivityDiary Initial commit 2021-04-05 18:45:52 +08:00
AmazeFileManager Initial commit 2021-04-05 18:45:52 +08:00
AnkiDroid Initial commit 2021-04-05 18:45:52 +08:00
FirefoxLite Initial commit 2021-04-05 18:45:52 +08:00
Frost Initial commit 2021-04-05 18:45:52 +08:00
MaterialFBook Initial commit 2021-04-05 18:45:52 +08:00
Omni-Notes Initial commit 2021-04-05 18:45:52 +08:00
Phonograph Initial commit 2021-04-05 18:45:52 +08:00
Scarlet-Notes Initial commit 2021-04-05 18:45:52 +08:00
WordPress Initial commit 2021-04-05 18:45:52 +08:00
and-bible Initial commit 2021-04-05 18:45:52 +08:00
collect Initial commit 2021-04-05 18:45:52 +08:00
commons Initial commit 2021-04-05 18:45:52 +08:00
geohashdroid Initial commit 2021-04-05 18:45:52 +08:00
nextcloud Initial commit 2021-04-05 18:45:52 +08:00
open-event-attendee-android Initial commit 2021-04-05 18:45:52 +08:00
openlauncher Initial commit 2021-04-05 18:45:52 +08:00
osmeditor4android Initial commit 2021-04-05 18:45:52 +08:00
scripts revise 2021-06-03 00:46:31 -07:00
sunflower Initial commit 2021-04-05 18:45:52 +08:00
tools add more scripts 2021-05-26 16:23:50 +08:00
INSTALL.txt revise 2021-05-25 14:13:19 +08:00
LICENSE.txt revise 2021-05-25 14:13:19 +08:00
README.md revise 2021-06-03 01:04:54 -07:00
REQUIREMENTS.txt revise 2021-05-25 14:13:19 +08:00
STATUS.txt revise 2021-05-25 14:13:19 +08:00
abstract_for_registration.docx revise 2021-05-25 16:25:07 +08:00
esecfse2021-paper1009.pdf Add files via upload 2021-05-29 17:13:06 +02:00

README.md

The Themis Benchmark

Themis is a collection of real-world, reproducible crash bugs (collected from open-source Android apps) and a unified, extensible infrastructure for benchmarking automated GUI testing for Android and beyond.

1. Contents of Themis

Themis's bug dataset

Themis now contains 52 reproducible crash bugs. All these bugs are labeled by the app developers as "critical bugs" (i.e., important bugs), which affected the major app functionalities and the larger percentage of app users.

For each bug, we provide:

  • The original bug report of the bug

  • An executable APK (Jacoco-instrumented for coverage collection),

  • A bug-reproducing script (python-based script for bug inspection) and a video

  • The stack trace of the bug

  • Metadata for supporting evaluation (e.g., app login scripts and configuration files used by Themis for code coverage computation)

  • The app source code w.r.t each bug

List of crash bugs

Issue Id App Bug report, Bug data Code Version App Category GitHub Stars Reproducible (Android SDK)? Network? Login? System Setting?
1 AmazeFileManager #1837, data 3.4.2 File Manager 3.0K 6.0/7.1 no no no
2 AmazeFileManager #1796, data 3.3.2 File Manager 3.0K 4.4/6.0/7.1 no no no
3 AmazeFileManager #1558, data 3.3.2 File Manager 3.0K 6.0/7.1 no no no
4 AmazeFileManager #1232, data 3.2.1 File Manager 3.0K 6.0/7.1 no no no
5 ActivityDiary #285, data 1.4.0 Personal Diary 55 6.0/7.1 no no no
6 ActivityDiary #118, data 1.1.8 Personal Diary 55 6.0/7.1 no no no
7 and-bible #375, data 3.1.309 Bible Reader 231 4.4/6.0/7.1 yes no no
8 and-bible #697, data 3.2.369 Bible Reader 231 6.0/7.1 yes no no
9 and-bible #261, data 3.0.286 Bible Reader 231 6.0/7.1 yes no no
10 and-bible #703, data 3.3.377 Bible Reader 231 6.0/7.1 yes no no
11 and-bible #480, data 3.2.327 Bible Reader 231 4.4/6.0/7.1 yes no no
12 Anki-Android #4707, data 2.9alpha4 Card Learning 2.8K 7.1 no no no
13 Anki-Android #5638, data 2.9.1 Card Learning 2.8K 6.0/7.1 no no no
14 Anki-Android #4200, data 2.6 Card Learning 2.8K 4.4/6.0/7.1 no no no
15 Anki-Android #4451, data 2.7 Card Learning 2.8K 4.4/6.0/7.1 no no no
16 Anki-Android #6145, data 2.10 Card Learning 2.8K 4.4/6.0/7.1 no no yes
17 Anki-Android #5756, data 2.9.4 Card Learning 2.8K 4.4/6.0/7.1 no no no
18 Anki-Android #4977, data 2.9 Card Learning 2.8K 4.4/6.0/7.1 no no no
19 APhotoManager #116, data 0.6.4 Photo Manager 148 4.4/6.0/7.1 no no no
20 collect #3222, data 1.23.0 Form Data Collector 528 4.4/6.0/7.1 yes no no
21 commons #3244, data 2.11.0 Wikimedia 601 6.0/7.1 yes yes no
22 commons #2123, data 2.9.0 Wikimedia 601 6.0/7.1 yes yes no
23 commons #1391(Ref:#1329), data 2.6.7 Wikimedia 601 6.0/7.1 yes yes no
24 commons #1385(Ref:#1329), data 2.6.7 Wikimedia 601 6.0/7.1 yes yes no
25 commons #1581, data 2.7.1 Wikimedia 601 6.0/7.1 yes yes yes
26 FirefoxLite #4881, data 2.1.12 Browser 231 6.0/7.1 yes no no
27 FirefoxLite #5085, data 2.1.20 Browser 231 6.0/7.1 yes no no
28 FirefoxLite #4942, data 2.1.16 Browser 231 6.0/7.1 yes no no
29 Frost-for-Facebook #1323, data 2.2.1 Facebook Client 377 6.0/7.1 yes yes yes
30 geohashdroid #73, data 0.9.4 Geohash 13 4.4/6.0/7.1 no no no
31 MaterialFBook #224, data 4.0.2 Social 122 6.0/7.1 yes yes no
32 nextcloud #5173, data 3.10.0 Productivity 1.9K 6.0/7.1 yes yes no
33 nextcloud #4026, data 3.6.1 Productivity 1.9K 6.0/7.1 yes yes no
34 nextcloud #4792, data 3.9.2 Productivity 1.9K 4.4/6.0/7.1 yes yes no
35 nextcloud #1918, data 2.0.0 Productivity 1.9K 6.0/7.1 yes yes no
36 Omni-Notes #745, data 6.1.0 Notebook 2.1K 4.4/6.0/7.1 no no no
37 open-event-attendee #2198, data 0.5 Social Events 1.5K 6.0/7.1 yes no no
38 openlauncher #67(Ref:#250), data 0.3.1 App Launcher 256 6.0/7.1 no no yes
39 osmeditor4android #729, data 11.0.0 Map 171 4.4/6.0/7.1 no no no
40 osmeditor4android #637, data 0.9.10 Map 171 6.0/7.1 no no no
41 Phonograph #112, data 0.15.0 Music Player 2.4K 4.4/6.0/7.1 no no no
42 Scarlet-Notes #114, data 6.9.5 Notebook 272 4.4/6.0/7.1 no no no
43 sunflower #239, data 0.1.6 Utility Tool 12K 4.4/6.0/7.1 no no no
44 wordpress #8659, data 11.3 Social 2.3K 6.0/7.1 yes yes no
45 wordpress #7182, data 9.2 Social 2.3K 4.4/6.0/7.1 yes yes no
46 wordpress #6530, data 8.1 Social 2.3K 4.4/6.0/7.1 yes yes no
47 wordpress #11992, data 14.9 Social 2.3K 6.0/7.1 yes yes no
48 wordpress #11135, data 13.6 Social 2.3K 6.0/7.1 yes yes no
49 wordpress #10876, data 13.7 Social 2.3K 6.0/7.1 yes yes no
50 wordpress #10547, data 13.3 Social 2.3K 6.0/7.1 yes yes no
51 wordpress #10363, data 13.1 Social 2.3K 6.0/7.1 yes yes no
52 wordpress #10302, data 12.9 Social 2.3K 6.0/7.1 yes yes no

Themis's Infrastructure

Themis contains a unified, extensible infrastructure for benchmarking automated GUI testing for Android. Any testing tools can be easily integrated into this infrastructure and deployed on a given machine with one line of command.

List of Supported Tools

Tool Name Venue Open-source Main Technique Need App Code? Need App Instrumentation Supported SDKs Implementation Basis
Monkey - yes Random Testing no no Any -
Ape ICSE'19 yes Model-based no no 6.0/7.1 Monkey-based
Humanoid ASE'19 yes Deep learning-based no no Any DroidBot-based
ComboDroid ICSE'20 yes Model-based no yes 6.0/7.1 Monkey-based
TimeMachine ICSE'20 yes State-based no yes 4.4/7.1 Monkey-based
Q-testing ISSTA'20 no reinforcement learning-based no no 4.4/7.1/9.0 -
Stoat FSE'17 yes Model-based no no Any A3E-based
Sapienz ISSTA'16 no Search-based no no 4.4 Monkey-based

The command line for deployment:

usage: themis.py [-h] [--avd AVD_NAME] [--apk APK] [-n NUMBER_OF_DEVICES] [--apk-list APK_LIST] -o O [--time TIME] [--repeat REPEAT] [--max-emu MAX_EMU] [--no-headless] [--login LOGIN_SCRIPT]
                 [--wait IDLE_TIME] [--monkey] [--ape] [--timemachine] [--combo] [--combo-login] [--humanoid] [--stoat] [--sapienz] [--qtesting] [--weighted] [--offset OFFSET]

optional arguments:
  -h, --help            show this help message and exit
  --avd AVD_NAME        the device name
  --apk APK
  -n NUMBER_OF_DEVICES  number of emulators created for testing, by default: 1
  --apk-list APK_LIST   list of apks under test
  -o O                  output dir
  --time TIME           the fuzzing time in hours (e.g., 6h), minutes (e.g., 6m), or seconds (e.g., 6s), default: 6h
  --repeat REPEAT       the repeated number of runs, default: 1
  --max-emu MAX_EMU     the maximum allowed number of emulators
  --no-headless         show gui
  --login LOGIN_SCRIPT  the script for app login
  --wait IDLE_TIME      the idle time to wait before starting the fuzzing
  --monkey
  --ape
  --timemachine
  --combo
  --combo-login
  --humanoid
  --stoat
  --sapienz
  --qtesting
  --offset OFFSET       device offset number w.r.t emulator-5554

Implementation details

The directory structure of Themis is as follows:

Themis
   |
   |--- esecfse2021-paper1009.pdf   the accepted paper of Themis
   |
   |--- scripts:                    scripts for running testing tools and analyzing testing results.
       |
       |--- themis.py:              the main script for deploying themis.
       |
       |--- check_crash.py:         the script to check whether a tool find the bugs.
       |
       |--- compute_coverage.py:    the script to compute the code coverage achieved by a tool.
       |
       |--- compare_bug_triggering_time.py: the script to pairwisely compare bug-triggering times between different tools.        
       |
       |--- run_monkey.sh           the internal shell script to invokte Monkey, Ape, Humanoid, ComboDroid, TimeMachine and Q-testing
       |--- run_ape.sh
       |--- run_humanoid.sh
       |--- run_qtesting.sh
       |--- run_timemachine.sh
       |--- run_combodroid.sh
       |
   |--- tools:                      the supported auotmated testing tools.
       |
       |--- Humanoid                the tool Humanoid 
       |
       |--- TimeMachine             the tool TimeMachine
       | 
       |--- Q-testing               the tool Q-testing
       |
       |--- Ape                     the tool Ape
       |
       |--- ComboDroid              the tool ComboDroid
       |
       |--- Monkey                  the tool Monkey
       |
   |--- app_1:             The bugs collected from app_1.
   |
   |--- app_2:             The bugs collected from app_2.
   |
   |--- ...
   |
   |--- app_N              The bugs collected from app_n.

2. Instructions for Artifact Evaluation

For artifact evaluation, we recommend you to run Themis in Virtual Machine. All the required stuffs are already installed and prepared. You can download the VM package Themis_VM.zip from this link on Google Drive (fully-anonymized and publicly accessible).

Prerequisite

  • You need to enable the virtualization technology in your computer's BIOS (see this link for how to enable the virtualization technology). Most computers by default already have this virtualization option turned on.
  • Your computer needs at least 8 CPU cores (4 cores may also work), 16G of memory, and at least 40G of storage.
  • We built our artifact by using VirtualBox v6.1.20. Please install VirtualBox based on your OS type. After installing VirtualBox, you may need to reboot the computer.

Setup Virtual Machine

  1. Extract the downloaded file Themis_VM.zip and get the VM image file Themis.ova.
  2. Open VirtualBox, click "File", click "Import Appliance", then import the file named Themis.ova (this step may take about five to ten minutes to complete).
  3. After the import is completed, you should see "vm" as one of the listed VMs in your VirtualBox.
  4. Click "Settings", click "System", click "Processor", and allocate 4-8 CPU cores (8-cores is preferred), and check "Enable Nested VT-x/AMD-V". Click "Memory", and set memory size to at least 8GB (16GB is preferred). Overall, you can allocate more memory and CPU cores if your system permits to ensure smooth evaluation.
  5. Run the virtual machine. The username and the password are both Themis.
  6. If you could not run the VM with "Nested VT-x/AMD-V" option enabled in VirtualBox, you should check whether the Hyper-V option is enabled. You can disable the Hyper-V option (see this link for more information about this).

Getting Started (for Initial Review)

Take the quick test to get familar with Themis and validate whether it is ready for evaluation.

Step 1. open a terminal and switch to Themis's scripts directory

cd the-themis-benchmark/scripts

Step 2. run Monkey on one target bug for 10 minutes

python3 themis.py --no-headless --avd Android7.1 --apk ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk --time 10m -o ../monkey-results/ --monkey

Here,

  • --no-headless shows the emulator GUI.
  • --avd Android7.1 specifies the name of the emulator (which has already been created in the VM).
  • --apk ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk specifies the target bug which is ActivityDiary's bug #118 in v1.1.8.
  • --time 10m allocates 10 minutes for the testing tool to find the bug
  • -o ../monkey-results/ specifies the output directory of testing results
  • --monkey specifies the testing tool

Expected results: you should see (1) an Android emulator is started, (2) the app ActivityDiary is installed and started, (3) Monkey is started to test the app, (4) the following sample texts are outputted on the terminal during testing, and (5) the emulator is automatically closed at the end.

**click to see the sample output on the terminal of a successful run.**

allocate emulators: emulator-5554
the apk list to fuzz: ['../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk']
True
Now allocate the apk: ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk on emulator-5554
its login script: ""
wait the allocated devices to finish...
execute monkey: bash -x run_monkey.sh ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk emulator-5554 Android7.1 ../monkey-results/ 10m "" ""
+ APK_FILE=../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk
+ AVD_SERIAL=emulator-5554
+ AVD_NAME=Android7.1
+ OUTPUT_DIR=../monkey-results/
+ TEST_TIME=10m
+ HEADLESS=
+ LOGIN_SCRIPT=
+ RETRY_TIMES=5
++ seq 1 5
+ for i in $(seq 1 $RETRY_TIMES)
+ echo 'try to start the emulator (emulator-5554)...'
try to start the emulator (emulator-5554)...
+ sleep 5
+ avd_port=5554
+ sleep 5
+ emulator -port 5554 -avd Android7.1 -read-only
emulator: Requested console port 5554: Inferring adb port 5555.
qemu-system-i386: warning: TSC frequency mismatch between VM (2903985 kHz) and host (2903991 kHz), and TSC scaling unavailable
qemu-system-i386: warning: TSC frequency mismatch between VM (2903985 kHz) and host (2903991 kHz), and TSC scaling unavailable
+ wait_for_device emulator-5554
+ avd_serial=emulator-5554
+ timeout 5s adb -s emulator-5554 wait-for-device
++ adb -s emulator-5554 shell getprop init.svc.bootanim
adb: device offline
+ OUT=
+ i=0
+ [[ '' != \s\t\o\p\p\e\d ]]
+ echo '   Waiting for emulator (emulator-5554) to fully boot (#0 times) ...'
   Waiting for emulator (emulator-5554) to fully boot (#0 times) ...
+ sleep 5
++ expr 0 + 1
+ i=1
+ [[ 1 == 10 ]]
++ adb -s emulator-5554 shell getprop init.svc.bootanim
adb: device offline
+ OUT=
+ [[ '' != \s\t\o\p\p\e\d ]]
+ echo '   Waiting for emulator (emulator-5554) to fully boot (#1 times) ...'
   Waiting for emulator (emulator-5554) to fully boot (#1 times) ...
+ sleep 5
++ expr 1 + 1
+ i=2
+ [[ 2 == 10 ]]
++ adb -s emulator-5554 shell getprop init.svc.bootanim
adb: device offline
+ OUT=
+ [[ '' != \s\t\o\p\p\e\d ]]
+ echo '   Waiting for emulator (emulator-5554) to fully boot (#2 times) ...'
   Waiting for emulator (emulator-5554) to fully boot (#2 times) ...
+ sleep 5
++ expr 2 + 1
+ i=3
+ [[ 3 == 10 ]]
++ adb -s emulator-5554 shell getprop init.svc.bootanim
adb: device offline
+ OUT=
+ [[ '' != \s\t\o\p\p\e\d ]]
+ echo '   Waiting for emulator (emulator-5554) to fully boot (#3 times) ...'
   Waiting for emulator (emulator-5554) to fully boot (#3 times) ...
+ sleep 5
++ expr 3 + 1
+ i=4
+ [[ 4 == 10 ]]
++ adb -s emulator-5554 shell getprop init.svc.bootanim
adb: device offline
+ OUT=
+ [[ '' != \s\t\o\p\p\e\d ]]
+ echo '   Waiting for emulator (emulator-5554) to fully boot (#4 times) ...'
   Waiting for emulator (emulator-5554) to fully boot (#4 times) ...
+ sleep 5
++ expr 4 + 1
+ i=5
+ [[ 5 == 10 ]]
++ adb -s emulator-5554 shell getprop init.svc.bootanim
adb: device offline
+ OUT=
+ [[ '' != \s\t\o\p\p\e\d ]]
+ echo '   Waiting for emulator (emulator-5554) to fully boot (#5 times) ...'
   Waiting for emulator (emulator-5554) to fully boot (#5 times) ...
+ sleep 5
++ expr 5 + 1
+ i=6
+ [[ 6 == 10 ]]
++ adb -s emulator-5554 shell getprop init.svc.bootanim
adb: device offline
+ OUT=
+ [[ '' != \s\t\o\p\p\e\d ]]
+ echo '   Waiting for emulator (emulator-5554) to fully boot (#6 times) ...'
   Waiting for emulator (emulator-5554) to fully boot (#6 times) ...
+ sleep 5
++ expr 6 + 1
+ i=7
+ [[ 7 == 10 ]]
++ adb -s emulator-5554 shell getprop init.svc.bootanim
adb: device offline
+ OUT=
+ [[ '' != \s\t\o\p\p\e\d ]]
+ echo '   Waiting for emulator (emulator-5554) to fully boot (#7 times) ...'
   Waiting for emulator (emulator-5554) to fully boot (#7 times) ...
+ sleep 5
++ expr 7 + 1
+ i=8
+ [[ 8 == 10 ]]
++ adb -s emulator-5554 shell getprop init.svc.bootanim
adb: device offline
+ OUT=
+ [[ '' != \s\t\o\p\p\e\d ]]
+ echo '   Waiting for emulator (emulator-5554) to fully boot (#8 times) ...'
   Waiting for emulator (emulator-5554) to fully boot (#8 times) ...
+ sleep 5
++ expr 8 + 1
+ i=9
+ [[ 9 == 10 ]]
++ adb -s emulator-5554 shell getprop init.svc.bootanim
adb: device offline
+ OUT=
+ [[ '' != \s\t\o\p\p\e\d ]]
+ echo '   Waiting for emulator (emulator-5554) to fully boot (#9 times) ...'
   Waiting for emulator (emulator-5554) to fully boot (#9 times) ...
+ sleep 5
++ expr 9 + 1
+ i=10
+ [[ 10 == 10 ]]
+ echo 'Cannot connect to the device: (emulator-5554) after (#10 times)...'
Cannot connect to the device: (emulator-5554) after (#10 times)...
+ break
++ adb -s emulator-5554 shell getprop init.svc.bootanim
adb: device offline
+ OUT=
+ [[ '' != \s\t\o\p\p\e\d ]]
+ adb -s emulator-5554 emu kill
OK: killing emulator, bye bye
OK
+ echo 'try to restart the emulator (emulator-5554)...'
try to restart the emulator (emulator-5554)...
+ [[ 10 == RETRY_TIMES ]]
+ for i in $(seq 1 $RETRY_TIMES)
+ echo 'try to start the emulator (emulator-5554)...'
try to start the emulator (emulator-5554)...
+ sleep 5
emulator: WARNING: Not saving state: RAM not mapped as shared
+ avd_port=5554
+ sleep 5
+ emulator -port 5554 -avd Android7.1 -read-only
emulator: Requested console port 5554: Inferring adb port 5555.
qemu-system-i386: warning: TSC frequency mismatch between VM (2903985 kHz) and host (2903991 kHz), and TSC scaling unavailable
qemu-system-i386: warning: TSC frequency mismatch between VM (2903985 kHz) and host (2903991 kHz), and TSC scaling unavailable
+ wait_for_device emulator-5554
+ avd_serial=emulator-5554
+ timeout 5s adb -s emulator-5554 wait-for-device
++ adb -s emulator-5554 shell getprop init.svc.bootanim
+ OUT=stopped
+ i=0
+ [[ stopped != \s\t\o\p\p\e\d ]]
++ adb -s emulator-5554 shell getprop init.svc.bootanim
+ OUT=stopped
+ [[ stopped != \s\t\o\p\p\e\d ]]
+ break
+ echo '  emulator (emulator-5554) is booted!'
  emulator (emulator-5554) is booted!
+ adb -s emulator-5554 root
restarting adbd as root
++ date +%Y-%m-%d-%H-%M-%S
+ current_date_time=2021-05-26-16-58-44
++ basename ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk
+ apk_file_name=ActivityDiary-1.1.8-debug-#118.apk
+ result_dir=../monkey-results//ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5554.Android7.1#2021-05-26-16-58-44
+ mkdir -p ../monkey-results//ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5554.Android7.1#2021-05-26-16-58-44
+ echo '** CREATING RESULT DIR (emulator-5554): ' ../monkey-results//ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5554.Android7.1#2021-05-26-16-58-44
** CREATING RESULT DIR (emulator-5554):  ../monkey-results//ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5554.Android7.1#2021-05-26-16-58-44
+ [[ '' != '' ]]
+ adb -s emulator-5554 install -g ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk
+ echo '** INSTALL APP (emulator-5554)'
** INSTALL APP (emulator-5554)
+ sleep 10
++ aapt dump badging ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk
++ grep package
++ awk '{print $2}'
++ sed s/name=//g
++ sed 's/'\''//g'
+ app_package_name=de.rampro.activitydiary.debug
+ echo '** PROCESSING APP (emulator-5554): ' de.rampro.activitydiary.debug
** PROCESSING APP (emulator-5554):  de.rampro.activitydiary.debug
+ echo '** START LOGCAT (emulator-5554) '
** START LOGCAT (emulator-5554) 
+ adb -s emulator-5554 logcat -c
+ echo '** START COVERAGE (emulator-5554) '
** START COVERAGE (emulator-5554) 
+ adb -s emulator-5554 logcat AndroidRuntime:E CrashAnrDetector:D System.err:W CustomActivityOnCrash:E ACRA:E WordPress-EDITOR:E '*:F' '*:S'
+ echo '** RUN MONKEY (emulator-5554)'
** RUN MONKEY (emulator-5554)
+ adb -s emulator-5554 shell date +%Y-%m-%d-%H:%M:%S
+ bash dump_coverage.sh emulator-5554 de.rampro.activitydiary.debug ../monkey-results//ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5554.Android7.1#2021-05-26-16-58-44
+ timeout 10m adb -s emulator-5554 shell monkey -p de.rampro.activitydiary.debug -v --throttle 200 --ignore-crashes --ignore-timeouts --ignore-security-exceptions --bugreport 1000000
+ tee ../monkey-results//ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5554.Android7.1#2021-05-26-16-58-44/monkey.log
:Monkey: seed=1622177363949 count=1000000
:AllowPackage: de.rampro.activitydiary.debug
:IncludeCategory: android.intent.category.LAUNCHER
:IncludeCategory: android.intent.category.MONKEY
// Event percentages:
//   0: 15.0%
//   1: 10.0%
//   2: 2.0%
//   3: 15.0%
//   4: -0.0%
//   5: -0.0%
//   6: 25.0%
//   7: 15.0%
//   8: 2.0%
//   9: 2.0%
//   10: 1.0%
//   11: 13.0%
:Switch: #Intent;action=android.intent.action.MAIN;category=android.intent.category.LAUNCHER;launchFlags=0x10200000;component=de.rampro.activitydiary.debug/de.rampro.activitydiary.ui.main.MainActivity;end
    // Allowing start of Intent { act=android.intent.action.MAIN cat=[android.intent.category.LAUNCHER] cmp=de.rampro.activitydiary.debug/de.rampro.activitydiary.ui.main.MainActivity } in package de.rampro.activitydiary.debug
:Sending Trackball (ACTION_MOVE): 0:(3.0,0.0)
:Sending Touch (ACTION_DOWN): 0:(6.0,111.0)
:Sending Touch (ACTION_UP): 0:(75.08969,0.25681877)
:Sending Touch (ACTION_DOWN): 0:(461.0,414.0)
:Sending Touch (ACTION_UP): 0:(463.19458,413.57645)
:Sending Touch (ACTION_DOWN): 0:(443.0,1154.0)
:Sending Touch (ACTION_UP): 0:(337.0955,1190.089)
:Sending Touch (ACTION_DOWN): 0:(392.0,379.0)
:Sending Touch (ACTION_UP): 0:(404.1744,297.83728)
:Sending Trackball (ACTION_MOVE): 0:(-1.0,-4.0)
:Sending Touch (ACTION_DOWN): 0:(739.0,89.0)
:Sending Touch (ACTION_UP): 0:(800.0,146.10786)
:Sending Trackball (ACTION_MOVE): 0:(-3.0,4.0)
:Sending Touch (ACTION_DOWN): 0:(749.0,443.0)
:Sending Touch (ACTION_UP): 0:(747.92065,456.49356)
:Sending Touch (ACTION_DOWN): 0:(485.0,814.0)
:Sending Touch (ACTION_UP): 0:(492.87933,804.6225)
:Sending Trackball (ACTION_MOVE): 0:(-4.0,-4.0)
:Sending Trackball (ACTION_MOVE): 0:(0.0,2.0)
:Sending Touch (ACTION_DOWN): 0:(454.0,119.0)
:Sending Touch (ACTION_UP): 0:(465.00302,121.41099)
:Sending Touch (ACTION_DOWN): 0:(476.0,305.0)
:Sending Touch (ACTION_UP): 0:(476.1956,313.51404)
+ adb -s emulator-5554 shell date +%Y-%m-%d-%H:%M:%S
+ echo '** STOP MONKEY (emulator-5554)'
** STOP MONKEY (emulator-5554)
++ adb -s emulator-5554 shell ps
++ grep monkey
++ awk '{print $2}'
+ adb -s emulator-5554 shell kill 17521
+ echo '** STOP COVERAGE (emulator-5554)'
** STOP COVERAGE (emulator-5554)
++ ps aux
++ grep 'dump_coverage.sh emulator-5554'
++ grep -v grep
++ awk '{print $2}'
+ kill 1248836
+ echo '** STOP LOGCAT (emulator-5554)'
** STOP LOGCAT (emulator-5554)
++ ps aux
++ grep 'emulator-5554 logcat'
++ grep -v grep
++ awk '{print $2}'
+ kill 1248835
+ sleep 5
+ adb -s emulator-5554 emu kill
OK: killing emulator, bye bye
OK
+ echo '@@@@@@ Finish (emulator-5554): ' de.rampro.activitydiary.debug @@@@@@@
@@@@@@ Finish (emulator-5554):  de.rampro.activitydiary.debug @@@@@@@

Step 3. inspect the output files

If Step 2 succeeds, you can see the outputs under ../monkey-results/ (i.e., /home/themis/the-themis-benchmark/monkey-results).

$ cd ../monkey-results/
$ ls
  ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5554.Android7.1#2020-06-24-20:39:27/         # the output directory
$ cd ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5554.Android7.1#2020-06-24-20:39:27/
$ ls
  coverage_1.ec   # the coverage data file  (used for computing coverage)
  coverage_2.ec 
  install.log     # the log of app installation
  logcat.log      # the system log of emulator (this file contains the crash stack traces if the target bug was triggered)
  monkey.log      # the log of Monkey (including the events that Monkey generates)
  monkey_testing_time_on_emulator.txt  # the first line is the starting testing time, and the second line is the ending testing time

How to validate: If you can see all these files and these files are non-empty (use ls -l to check), the quick test succeeds. Note that the number of coverage data files (e.g., coverage_1.ec) varies according to the testing time. In practice, Themis notifies an app to dump coverage data every five minutes. Please note that the outuput files of different testing tools may vary (but all the other tools have these similar types of output files like Monkey).

Whole Evaluation (for In-depth Review)

I. Validate the supported tools (Table 2 in the accepted paper)

Themis now supports and maintains 6 state-of-the-art fully-automated testing tools for Android (see below). These tools can be cloned from Themis's repositories and are put under themis/tools.

Note that these tools are the modified/enhanced versions of their originals because we coordinated with the authors of these tools to assure correct and rigorous setup (e.g., report the encountered tool bugs to the authors for fixing). We tried our best efforts to minimize the bias and ensure that each tool is at "its best state" in bug finding (see Section 3.2 in the accepted paper).

Specifically, we track the tool modifications to facilitate review and validation. We spent slight efforts to integrate Monkey, Ape, Humanoid and Q-testing into Themis. Combodroid was modifled by its author and intergrated into Themis, while TimeMachine was modified by us and later verified by its authors (view this commit to check all the modifications/enhancements made in TimeMachine).

II. Validate the bug dataset (Table 3 in the accepted paper)

Themis now contains 52 reproducible crash bugs. For each bug, you can view:

  • its metadata (e.g., original bug report, buggy app version)
  • its bug data (stack trace, executable apk, bug-triggering script/video)
  • its property (e.g., the Android SDKs on which the bug can be reproduced, does the app require network or login, does the bug involve changing system settings). We also give the minimal number of user actions to reproduce the bug (listed in Table 3 in the accepted paper).

III. Validate the bug finding results of these tools (Table 3, Table 4, Figure 1 in the accepted paper)

Our original evaluation setup (see Section 3.3 in the accepted paper) is:

We deployed our experiment on a 64-bit Ubuntu 18.04 machine (64
cores, AMD 2990WX CPU, and 128GB RAM). We chose Android 7.1
emulators (API level 25) since all selected tools support this version.
Each emulator is configured with 2GB RAM, 1GB SDCard, 1GB
internal storage, and X86 ABI image. Different types of external
files (including PNGs/MP3s/PDFs/TXTs/DOCXs) are stored on the
SDCard to facilitate file access from apps.

We allocated one device (i.e., one emulator) for each bug/tool 
in one run (one run required 6 hours), and repeated
5 runs for each bug/tool. The whole evaluation took over
52×5×6×7 = 10,920 machine hours (not including Sapienz). Due to
Androids limitation, we can only run 16 emulators in parallel on
one physical machine. Thus, the evaluation took us around 28 days,
in addition to around one week for deployment preparation.

Considering the large evaluation cost, we recommend you to try running 1-2 tools on 1-2 bugs at your will to validate the artifact if you do not have enough resources/time. Of course, to replicate the whole evaluation, we provide the instructions to follow and all the data files of our evaluation for inspection.

** In the following, we take Monkey as a tool and ActivityDiary-1.1.8-debug-#118.apk as a target bug to illustrate how to replicate the whole evaluation, and how to validate the artifact if you do not have enough resources/time**

[To replicate the whole evaluation]:

Step 1. run Monkey on ActivityDiary-1.1.8-debug-#118.apk for 6 hours and repeat this process for 5 runs. This step will take 30 hours to finish because of 5 runs of testing on one emulator. We do not recommend to run more than one emulators in the VM because of the limited memory and performance.

python3 themis.py --no-headless --avd Android7.1 --apk ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk -n 1 --repeat 5 --time 6h -o ../monkey-results/ --monkey 

Here,

  • --no-headless shows the emulator GUI
  • --avd Android7.1 specifies the name of the emulator (which has already been created in the VM).
  • --apk ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk specifies the target bug which is ActivityDiary's bug #118 in v1.1.8.
  • -n 1 denotes one emulator instance will be created (in practice at most 16 emulators are allowed to run in parallel on one native machine)
  • --repeat 5 denotes the testing process will be repeated for 5 runs (these 5 runs will be distributed to the available emulator instances)
  • --time 6h allocates 6 hours for the testing tool to find the bug
  • -o ../monkey-results/ specifies the output directory of testing results
  • --monkey specifies the testing tool

[If you do not have enough resources/time, we recommend you to try running 1-2 tools on 1-2 bugs at your will to validate the artifact]:

For example, in Step 1, we recommend you to shorten the testing time (e.g., use --time 1h for 1 hour or --time 30m for 30 minutes). Thus, you can use the following command (this step will take 2 hours to finish because of 2 runs of testing on one emulator).

python3 themis.py --no-headless --avd Android7.1 --apk ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk -n 1 --repeat 2 --time 1h -o ../monkey-results/ --monkey 

You can follow the similar instructions in Step 2 to check the bug finding results.

Step 2. When the testing terminates, you can inspect whether the target bug was found or not in each run, how long does it take to find the bug, and how many times the bug was found by using the command below.

python3 check_crash.py --monkey -o ../monkey-results/ --app ActivityDiary --id \#118 --simple

Here,

  • --app ActivityDiary specifies the target app (You can omit this option to check all the tested apps and their bugs).
  • --id \#118 specifies the target bug id (You can omit this option to check all the target bugs of app ActivityDiary).
  • --simple outputs the checking result to the terminal for quick check (You can substitute --simple with --csv FILE_PATH to output the checking results into a CSV file).
  • Use -h to see the detailed list of command options.

An example output could be (In this case, the target bug, ActivityDiary's #118, was not found by Moneky in all the five runs):

**click to see the sample output.**

ActivityDiary

=========

[ActivityDiary, #118] scanning (ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5554.Android7.1#2020-06-24-20:39:27) [ActivityDiary, #118] testing time: 2020-06-24-20:39:33 the start testing time is: 2020-06-24-20:39:33 the start testing time (parsed) is: 2020-06-24 20:39:33

[ActivityDiary, #118] scanning (ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5556.Android7.1#2020-06-24-20:39:32) [ActivityDiary, #118] testing time: 2020-06-24-20:39:36 the start testing time is: 2020-06-24-20:39:36 the start testing time (parsed) is: 2020-06-24 20:39:36

[ActivityDiary, #118] scanning (ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5558.Android7.1#2020-06-24-20:39:37) [ActivityDiary, #118] testing time: 2020-06-24-20:39:43 the start testing time is: 2020-06-24-20:39:43 the start testing time (parsed) is: 2020-06-24 20:39:43

[ActivityDiary, #118] scanning (ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5560.Android7.1#2020-06-24-20:39:42) [ActivityDiary, #118] testing time: 2020-06-24-20:39:47 the start testing time is: 2020-06-24-20:39:47 the start testing time (parsed) is: 2020-06-24 20:39:47

[ActivityDiary, #118] scanning (ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5562.Android7.1#2020-06-24-20:39:47) [ActivityDiary, #118] testing time: 2020-06-24-20:39:52 the start testing time is: 2020-06-24-20:39:52 the start testing time (parsed) is: 2020-06-24 20:39:52

Another example output could be (In this case, the target bug, AnkiDroid's #4451, was found by 1 time after running Monkey for 55 minutes in one run):

**click to see the sample output.**

AnkiDroid

=========

[AnkiDroid, #4451] scanning (AnkiDroid-debug-2.7beta1-#4451.apk.monkey.result.emulator-5554.Android7.1#2020-06-26-00:59:31) [AnkiDroid, #4451] testing time: 2020-06-26-00:59:32 [AnkiDroid, #4451] testing time: 2020-06-26-06:59:32 the start testing time is: 2020-06-26-00:59:32 the start testing time (parsed) is: 2020-06-26 00:59:32

[AnkiDroid, #4451] scanning (AnkiDroid-debug-2.7beta1-#4451.apk.monkey.result.emulator-5576.Android7.1#2020-06-26-12:04:55) [AnkiDroid, #4451] testing time: 2020-06-26-12:04:57 [AnkiDroid, #4451] testing time: 2020-06-26-18:04:57 the start testing time is: 2020-06-26-12:04:57 the start testing time (parsed) is: 2020-06-26 12:04:57

[AnkiDroid, #4451] scanning (AnkiDroid-debug-2.7beta1-#4451.apk.monkey.result.emulator-5574.Android7.1#2020-06-26-12:04:55) [AnkiDroid, #4451] testing time: 2020-06-26-12:04:56 [AnkiDroid, #4451] testing time: 2020-06-26-18:04:56 the start testing time is: 2020-06-26-12:04:56 the start testing time (parsed) is: 2020-06-26 12:04:56

[AnkiDroid, #4451] scanning (AnkiDroid-debug-2.7beta1-#4451.apk.monkey.result.emulator-5558.Android7.1#2020-06-26-00:59:38) [AnkiDroid, #4451] testing time: 2020-06-26-00:59:39 [AnkiDroid, #4451] testing time: 2020-06-26-06:59:40 the start testing time is: 2020-06-26-00:59:39 the start testing time (parsed) is: 2020-06-26 00:59:39

[AnkiDroid, #4451] scanning (AnkiDroid-debug-2.7beta1-#4451.apk.monkey.result.emulator-5556.Android7.1#2020-06-26-00:59:33) [AnkiDroid, #4451] testing time: 2020-06-26-00:59:34 [AnkiDroid, #4451] testing time: 2020-06-26-06:59:35 the start testing time is: 2020-06-26-00:59:34 the start testing time (parsed) is: 2020-06-26 00:59:34 [AnkiDroid, #4451] the crash was triggered (1) times [AnkiDroid, #4451] the time duration: ['55'] (mins)

How to validate: You can validate the artifact by comparing the above testing results in Step 2 with our original testing results recorded in this data file. This data file gives the detailed testing results for each of the 52 bugs by all the testing tools. For example, Monkey did not found the target bug in ActivityDiary-1.1.8-debug-#118.apk (see column Monkey and row ActivityDiary's #118 for the value 0/5), while Monkey found the target bug in AnkiDroid-debug-2.7beta1-#4451.apk in one out of five run (see column Monkey and row AnkiDroid's #4451 for the value 1/5). In this way, you can validate the data in Table 3 (* indicates the tool finds the bug in at least one run) and Table 4 in the accepted paper (the breakdown of which bugs were successfully in how many runs).

Notes

(1) You can substitute --monkey with --ape or --combo to directly run the corresponding tool. You may need to change the output directory -o ../monkey-results/ to a distinct directory, e.g., -o ../ape-results. You can follow the similar steps described above to inspect whether the target bug was found or not and the related info.

(2) Specifically, for humanoid, before running, you need to setup the specific running environment of Humanoid. Open a terminal, and run:

cd /home/themis/the-themis-benchmark/tools/Humanoid-tool
source venv/bin/activate   # Humanoid depends on tensorflow 1.12, which requires specific Python version
cd Humanoid
python3 agent.py -c config.json   # start the server of Humanoid

Open a new terminal, run Humanoid (which internally runs droidbot) on the emulator Android7.1_Humanoid (with specific screen size):

cd /home/themis/the-themis-benchmark/scripts/
python3 themis.py --no-headless --avd Android7.1_Humanoid --apk ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk --time 10m -o ../humanoid-results --humandroid

(3) Specifically, for Q-testing, before running, you need to setup the specific running environment of Humanoid. Open a terminal, and run:

cd /home/themis/the-themis-benchmark/tools/Q-testing
source venv/bin/activate   # Q-testing depends on Python 2.7

And then, run the command line:

cd /home/themis/the-themis-benchmark/scripts/
python3 themis.py --no-headless --avd Android7.1 --apk ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk --time 10m -o ../qtesting-results/ --qtest
ing

(4) Specifically, for TimeMachine, it depends on VirtualBox and docker by iteself. Thus, we cannot build TimeMachine within this VM because. TimeMachine can be built on native machines.

(5) If the app under test requires user login, you should specify the login script. Themis will call the login script before testing. For example, if we run Monkey on ../commons/commons-2.11.0-#3244.apk which requires user login, the command line should be:

python3 themis.py --avd Android7.1 --apk ../commons/commons-2.11.0-#3244.apk -n 1 --repeat 5 --time 6h -o ../monkey-results/ --login ../commons/login-2.11.0-#3244.py --monkey 

Here,

  • --login ../commons/login-2.11.0-#3244.py specifies the login script (which will be executed before GUI testing)

3. validate the data files

3. Instructions for Reusing Themis

Build and Use Themis from Scratch

In practice, we strongly recommend the users to setup our artifact on local native machines or remote servers rather than virtual machines to ensure (1) the optimal testing performance and (2) evaluation efficiency. Thus, we provide the instructions to setup Themis from scratch.

  1. setup Android development environment and Python 3 on your machine

  2. create an Android emulator before running Themis (see this link for creating an emulator using avdmanager).

  • An example: create an Android emulator avd_Android7.1 with SDK version 7.1 (API level 25), X86 ABI image and Google APIs:
sdkmanager "system-images;android-25;google_apis;x86"
avdmanager create avd --force --name avd_Android7.1 --package 'system-images;android-25;google_apis;x86' --abi google_apis/x86 --sdcard 1024M --device 'Nexus 7'
  1. (optional) modify the emulator configuration to ensure optimal testing performance of testing tools:

In our evaluation, we set an emulator with 2GB RAM, 1GB SdCard, 1GB internal storage and 256MB heap size (the file for modification is: ~/.android/avd/Android7.1.avd/config.ini)

sdcard.size=1024M
disk.dataPartition.size=1024M
vm.heapSize=256
hw.ramSize=2048
  1. (optional but recommended) copy dummy documents into emulators to allow file access from the apps under test
emulator -avd avd_Android7.1 -port 5554 &
cd themis/scripts
bash -x copy_dummy_documents.sh emulator-5554
emu kill emulator-5554
  1. install uiautomator2, which is used for executing login scripts
pip3 install --upgrade --pre uiautomator2
  1. If you run Themis on remote servers, please omit the option --no-headless which turns off the emulator GUI.

  2. Install all the necessary dependecies required by the respective testing tools (see the repository of each tools).

Extend Themis for Future Research

  1. Add new crash bugs into Themis

Take ActivityDiary-1.1.8-debug-#118.apk as an example, the basic steps to add such a bug into Themis's dataset include:

  • build the buggy app version into an executable apk file (i.e., ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk, where ActivityDiary is the app name, 1.1.8 is the code version, #118 is the original issue id.)
  • reproduce the bug and record the stack trace (i.e., ActivityDiary/crash_stack_#118.txt)
  • write a bug-triggering script in uiautomator2 and record a bug-triggering video (i.e., ActivityDiary/script-#118.py and ActivityDiary/video-#118.mp4)
  • add a JSON file to facilitate coverage computation (used by TimeMachine) which describes its class files and source files (i.e., ActivityDiary/class_files.json)

The basic steps to add such a bug into Themis's infrastructure include:

  • In scripts/check_crash.py, one should add the app name into list ALL_APPS and the crash signature (i.e., the crash type info and the partial crash trace related to the app itself) into dict app_crash_data.

In the future, we plan to (1) add the crash bugs from other existing benchmarks, (2) add crash bugs with different levels of severity rather than only critical ones, and (3) add non-crashing bugs (e.g., UI bugs) into Themis. For non-crashing bugs, we can manually codify the bug-triggering condition into specific assertions in the app code so that Themis can be reused to evaluate the existing tools on finding non-crashing bugs by violating the assertions. We are already working on (2).

  1. Add new testing tools into Themis

Take Monkey as an example, the basic steps to add a new tool into Themis's infrastructure include:

  • add an internal shell script to invoke the new tool (see scripts/run_monkey.sh) which defines (1) the concrete command line of invoking the tool, (2) the outputs of the tool, and (3) the tool-specific configurations before running
  • add the call of this shell script in scripts/themis.py (see function run_monkey)
  • add the code of parsing the outputs of the new tool in scripts/check_crash.py.

In fact, we already have integrated FastBot, an industrial testing tool from ByteDance, into Themis.

  1. Optimize and enhance existing supported tools

In the accepted paper, Section 4.2 and 4.3 point out many optimization opportunities and future research for improving existing testing tools. By using Themis,

  • The tool authors or other researchers can debug/validate the tool improvement and evaluate/compare with new testing tools
  • We can contribute new enhancement features to the original tools by pull requests because Themis forked the testing tools from their original repositories.
  1. Coverage profiling and analysis

Themis now supports coverage profiling by running

python3 compute_coverage.py -o ../monkey-results --monkey --app ActivityDiary --id \#118 --acc_csv

The main usage is:

usage: compute_coverage.py [-h] -o O [-v] [--monkey] [--ape] [--timemachine] [--combo] [--humandroid] [--qtesting] [--stoat] [--app APP_NAME] [--id ISSUE_ID] [--acc_csv ACC_CSV] [--single_csv SINGLE_CSV]
                           [--average_csv AVERAGE_CSV]

optional arguments:
  -h, --help            show this help message and exit
  -o O                  the output directory of testing results
  -v
  --monkey
  --ape
  --timemachine
  --combo
  --humandroid
  --qtesting
  --app APP_NAME
  --id ISSUE_ID
  --acc_csv ACC_CSV     compute the accumulative coverage of all runs
  --single_csv SINGLE_CSV
                        compute the coverage of single runs
  --average_csv AVERAGE_CSV
                        compute the average coverage of all runs
  1. Themis can also benefit other research (e.g., fault localization, program repair, etc.)