This is a set of python scripts to do web scraping on websites to gather security properties of Android Smartphones. It is part of my bachelor thesis, a link to it will be added after it is finished.
# websites and scraped attributes
## [GSMArena](gsmarena.com)
GSMArena is a website with hardware specs of phones targeted to consumers.
The following attributes are scraped:
* phone model
* chipset
* operating system version (at launch and updated)
* presence of a fingerprint scanner
As this website used rate limiting and blocking when too many request are issued it was scraped using a trial account at scraperapi.com
The Common Criteria Portal lists all product that are common critera certified and all associated reports. The scraper download a .csv file provided by the website, filters for mobile devices and then downloads all the reports that are linked there into a directory. Those reports can then be searched for smarphone models using [pdfgep](https://pdfgrep.org/)
# Usage
## Dependencies
* python
* scrapy
* pdfgrep (for common criteria portal)
## GSMArena
```
cd gsmarena
scrapy crawl attributes -o output.csv
```
## Android Enterprise Solutions Directory
```
cd adnroid_enterpise
scrapy crawl attributes -o output.csv
```
## Common Criteria Portal
```
cd common_criteria_scraper
python cc_portal_scraper.py
pdfgrep -riH "pixel 3" pdf
```
substitute "pixel 3" for the phone model you are looking for