Android Security Scraper
This is a set of python scripts to do web scraping on websites to gather security properties of Android Smartphones. It is part of my bachelor thesis, a link to it will be added after it is finished.
websites and scraped attributes
GSMArena
GSMArena is a website with hardware specs of phones targeted to consumers.
The following attributes are scraped:
- phone model
- chipset
- operating system version (at launch and updated)
- presence of a fingerprint scanner
As this website used rate limiting and blocking when too many request are issued it was scraped using a trial account at scraperapi.com
Android Enterprise Solutions Directory
The Android Enterprise Solutions Directory is a platform by Google with information about Android smartphones that are useful to enterprises.
The following attributes are scraped:
- phone model
- operating system version (at launch and updated)
- presence of a fingerprint scanner
- ioxt certification
- common criteria certification
Common Criteria Portal
The Common Criteria Portal lists all product that are common critera certified and all associated reports. The scraper download a .csv file provided by the website, filters for mobile devices and then downloads all the reports that are linked there into a directory. Those reports can then be searched for smarphone models using pdfgrep
Usage
Dependencies
- python
- Scrapy
- pdfgrep (for common criteria portal)
- scraperapi-sdk (for gsmarena) You can install the dependencies using the requirements.txt in the project subfolder like so:
cd gsmarena
pip install -r requirements.txt
GSMArena
cd gsmarena
scrapy crawl attributes -o output.csv
Android Enterprise Solutions Directory
cd adnroid_enterpise
scrapy crawl attributes -o output.csv
Common Criteria Portal
cd common_criteria_scraper
python cc_portal_scraper.py
pdfgrep -riH "pixel 3" pdf
substitute "pixel 3" for the phone model you are looking for