diff --git a/README.md b/README.md new file mode 100644 index 0000000000000000000000000000000000000000..33b7699b2a2b01b0c94f3ed349cc75a189884f29 --- /dev/null +++ b/README.md @@ -0,0 +1,51 @@ +# Android Security Scraper +This is a set of python scripts to do web scraping on websites to gather security properties of Android Smartphones. It is part of my bachelor thesis, a link to it will be added after it is finished. + +# websites and scraped attributes +## [GSMArena](gsmarena.com) +GSMArena is a website with hardware specs of phones targeted to consumers. + +The following attributes are scraped: +* phone model +* chipset +* operating system version (at launch and updated) +* presence of a fingerprint scanner + +As this website used rate limiting and blocking when too many request are issued it was scraped using a trial account at scraperapi.com + +## [Android Enterprise Solutions Directory](https://androidenterprisepartners.withgoogle.com/) +The Android Enterprise Solutions Directory is a platform by Google with information about Android smartphones that are useful to enterprises. + +The following attributes are scraped: +* phone model +* operating system version (at launch and updated) +* presence of a fingerprint scanner +* ioxt certification +* common criteria certification + +## [Common Criteria Portal](https://www.commoncriteriaportal.org/products/) +The Common Criteria Portal lists all product that are common critera certified and all associated reports. The scraper download a .csv file provided by the website, filters for mobile devices and then downloads all the reports that are linked there into a directory. Those reports can then be searched for smarphone models using [pdfgep](https://pdfgrep.org/) + +# Usage +## Dependencies +* python +* scrapy +* pdfgrep (for common criteria portal) +## GSMArena +``` +cd gsmarena +scrapy crawl attributes -o output.csv +``` +## Android Enterprise Solutions Directory +``` +cd adnroid_enterpise +scrapy crawl attributes -o output.csv +``` + +## Common Criteria Portal +``` +cd common_criteria_scraper +python cc_portal_scraper.py +pdfgrep -riH "pixel 3" pdf +``` +substitute "pixel 3" for the phone model you are looking for \ No newline at end of file