Skip to content
Snippets Groups Projects
Commit d595dcdd authored by Oliver Wiese's avatar Oliver Wiese
Browse files

Add TLDExtract lib

parent db658fc2
Branches
No related tags found
No related merge requests found
Showing
with 19069 additions and 1 deletion
...@@ -20,7 +20,10 @@ PODS: ...@@ -20,7 +20,10 @@ PODS:
- KeychainAccess (4.1.0) - KeychainAccess (4.1.0)
- mailcore2-ios (0.6.3) - mailcore2-ios (0.6.3)
- Onboard (2.3.1) - Onboard (2.3.1)
- Punycode (1.0.1)
- QAKit (0.0.7) - QAKit (0.0.7)
- TLDExtract (1.0.1):
- Punycode (~> 1.0)
- Travellib (0.0.1) - Travellib (0.0.1)
- VENTokenField (2.5.2): - VENTokenField (2.5.2):
- FrameAccessor (~> 1.0) - FrameAccessor (~> 1.0)
...@@ -34,6 +37,7 @@ DEPENDENCIES: ...@@ -34,6 +37,7 @@ DEPENDENCIES:
- mailcore2-ios (from `https://github.com/MailCore/mailcore2.git`, branch `master`) - mailcore2-ios (from `https://github.com/MailCore/mailcore2.git`, branch `master`)
- Onboard (= 2.3.1) - Onboard (= 2.3.1)
- QAKit - QAKit
- TLDExtract
- Travellib (from `https://git.imp.fu-berlin.de/jakobsbode/travellib.git`, branch `master`) - Travellib (from `https://git.imp.fu-berlin.de/jakobsbode/travellib.git`, branch `master`)
- VENTokenField (~> 2.0) - VENTokenField (~> 2.0)
...@@ -47,7 +51,9 @@ SPEC REPOS: ...@@ -47,7 +51,9 @@ SPEC REPOS:
- GTMSessionFetcher - GTMSessionFetcher
- KeychainAccess - KeychainAccess
- Onboard - Onboard
- Punycode
- QAKit - QAKit
- TLDExtract
- VENTokenField - VENTokenField
EXTERNAL SOURCES: EXTERNAL SOURCES:
...@@ -76,10 +82,12 @@ SPEC CHECKSUMS: ...@@ -76,10 +82,12 @@ SPEC CHECKSUMS:
KeychainAccess: 445e28864fe6d3458b41fa211bcdc39890e8bd5a KeychainAccess: 445e28864fe6d3458b41fa211bcdc39890e8bd5a
mailcore2-ios: 0637212770ea6b00d73de80b249b42ce937884ec mailcore2-ios: 0637212770ea6b00d73de80b249b42ce937884ec
Onboard: b6871f25ac753175b2ab9a362fb2feb26a81a311 Onboard: b6871f25ac753175b2ab9a362fb2feb26a81a311
Punycode: ddbef4a269780c8f19a7e8deb01d9f101cb2ef86
QAKit: abefda5db53a58012fc8410d310e0ef217515607 QAKit: abefda5db53a58012fc8410d310e0ef217515607
TLDExtract: 63aa739e9b50052ef04e792927c43db62b2bb6b5
Travellib: 819ccc356d19fdaf6f0b3c89db069d34aa6c3ec9 Travellib: 819ccc356d19fdaf6f0b3c89db069d34aa6c3ec9
VENTokenField: 5a19b838fb97f040e3d4c93f584b4adeaf3fc1ee VENTokenField: 5a19b838fb97f040e3d4c93f584b4adeaf3fc1ee
PODFILE CHECKSUM: 09ce4ff7b649af3f9a2f9ae1ce80b3f8243472ac PODFILE CHECKSUM: 3c83d9ee1ce95d57b28284fce2e2affe4d15b1e3
COCOAPODS: 1.9.0 COCOAPODS: 1.9.0
MIT License
Copyright (c) 2018 Gumob
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
[![Carthage compatible](https://img.shields.io/badge/Carthage-compatible-4BC51D.svg)](https://github.com/gumob/PunycodeSwift)
[![Version](http://img.shields.io/cocoapods/v/Punycode.svg)](http://cocoadocs.org/docsets/Punycode)
[![Platform](http://img.shields.io/cocoapods/p/Punycode.svg)](http://cocoadocs.org/docsets/Punycode)
[![Build Status](https://travis-ci.com/gumob/PunycodeSwift.svg?branch=master)](https://travis-ci.com/gumob/PunycodeSwift)
[![codecov](https://codecov.io/gh/gumob/PunycodeSwift/branch/master/graph/badge.svg)](https://codecov.io/gh/gumob/PunycodeSwift)
![Language](https://img.shields.io/badge/Language-Swift%204.2-orange.svg)
![Packagist](https://img.shields.io/packagist/l/doctrine/orm.svg)
# PunycodeSwift
<code>PunycodeSwift</code> is a pure Swift library to allows you to encode and decode `punycoded` strings by using String extension.
## What is Punycode?
Punycode is a representation of Unicode with the limited ASCII character subset used for Internet host names. Using Punycode, host names containing Unicode characters are transcoded to a subset of ASCII consisting of letters, digits, and hyphen, which is called the Letter-Digit-Hyphen (LDH) subset. For example, München (German name for Munich) is encoded as Mnchen-3ya. [(Wikipedia)](https://en.wikipedia.org/wiki/Punycode)
## Requirements
- iOS 9.3 or later
- macOS 10.12 or later
- tvOS 12.0 or later
- Swift 4.2
<small>* No plans to support tvOS 11 or earlier for now</small>
## Installation
### Carthage
Add the following to your `Cartfile` and follow [these instructions](https://github.com/Carthage/Carthage#adding-frameworks-to-an-application).
```
github "gumob/PunycodeSwift"
```
### CocoaPods
To integrate Punycode into your project, add the following to your `Podfile`.
```ruby
platform :ios, '9.3'
use_frameworks!
pod 'Punycode'
```
## Usage
Encode and decode IDNA:
```
import Punycode
var sushi: String = "寿司"
sushi = sushi.idnaEncoded!
print(sushi) // xn--sprr0q
sushi = sushi.idnaDecoded!
print(sushi) // "寿司"
```
Encode and decode Punycode directly:
```
import Punycode
var sushi: String = "寿司"
sushi = sushi.punycodeEncoded!
print(sushi) // sprr0q
sushi = sushi.punycodeDecoded!
print(sushi) // "寿司"
```
## Copyright
Punycode is released under MIT license, which means you can modify it, redistribute it or use it however you like.
//
// Created by kojirof on 2018-11-19.
// Copyright (c) 2018 Gumob. All rights reserved.
//
import Foundation
// For calling site convenience everything is implemented over Substring and String API is wrapped around it
public extension Substring {
/// Returns new string in punycode encoding (RFC 3492)
///
/// - Returns: Punycode encoded string or nil if the string can't be encoded
var punycodeEncoded: String? {
return Punycode().encodePunycode(self)
}
/// Returns new string decoded from punycode representation (RFC 3492)
///
/// - Returns: Original string or nil if the string doesn't contain correct encoding
var punycodeDecoded: String? {
return Punycode().decodePunycode(self)
}
/// Returns new string containing IDNA-encoded hostname
///
/// - Returns: IDNA encoded hostname or nil if the string can't be encoded
var idnaEncoded: String? {
return Punycode().encodeIDNA(self)
}
/// Returns new string containing hostname decoded from IDNA representation
///
/// - Returns: Original hostname or nil if the string doesn't contain correct encoding
var idnaDecoded: String? {
return Punycode().decodedIDNA(self)
}
}
public extension String {
/// Returns new string in punycode encoding (RFC 3492)
///
/// - Returns: Punycode encoded string or nil if the string can't be encoded
var punycodeEncoded: String? {
return self[..<self.endIndex].punycodeEncoded
}
/// Returns new string decoded from punycode representation (RFC 3492)
///
/// - Returns: Original string or nil if the string doesn't contain correct encoding
var punycodeDecoded: String? {
return self[..<self.endIndex].punycodeDecoded
}
/// Returns new string containing IDNA-encoded hostname
///
/// - Returns: IDNA encoded hostname or nil if the string can't be encoded
var idnaEncoded: String? {
return self[..<self.endIndex].idnaEncoded
}
/// Returns new string containing hostname decoded from IDNA representation
///
/// - Returns: Original hostname or nil if the string doesn't contain correct encoding
var idnaDecoded: String? {
return self[..<self.endIndex].idnaDecoded
}
}
//
// Created by kojirof on 2018-11-19.
// Copyright (c) 2018 Gumob. All rights reserved.
//
import Foundation
/// Helpers
extension Substring {
internal func lastIndex(of element: Character) -> String.Index? {
var position: Index = endIndex
while position > startIndex {
position = self.index(before: position)
if self[position] == element {
return position
}
}
return nil
}
}
extension UnicodeScalar {
internal var isValid: Bool {
return value < 0xD880 || (value >= 0xE000 && value <= 0x1FFFFF)
}
}
//
// Created by kojirof on 2018-11-19.
// Copyright (c) 2018 Gumob. All rights reserved.
//
import Foundation
public class Punycode {
/// Punycode RFC 3492
/// See https://www.ietf.org/rfc/rfc3492.txt for standard details
private let base: Int = 36
private let tMin: Int = 1
private let tMax: Int = 26
private let skew: Int = 38
private let damp: Int = 700
private let initialBias: Int = 72
private let initialN: Int = 128
/// RFC 3492 specific
private let delimiter: Character = "-"
private let lowercase: ClosedRange<Character> = "a"..."z"
private let digits: ClosedRange<Character> = "0"..."9"
private let lettersBase: UInt32 = Character("a").unicodeScalars.first!.value
private let digitsBase: UInt32 = Character("0").unicodeScalars.first!.value
/// IDNA
private let ace: String = "xn--"
private func adaptBias(_ delta: Int, _ numberOfPoints: Int, _ firstTime: Bool) -> Int {
var delta: Int = delta
if firstTime {
delta /= damp
} else {
delta /= 2
}
delta += delta / numberOfPoints
var k: Int = 0
while delta > ((base - tMin) * tMax) / 2 {
delta /= base - tMin
k += base
}
return k + ((base - tMin + 1) * delta) / (delta + skew)
}
/// Maps a punycode character to index
private func punycodeIndex(for character: Character) -> Int? {
if lowercase.contains(character) {
return Int(character.unicodeScalars.first!.value - lettersBase)
} else if digits.contains(character) {
return Int(character.unicodeScalars.first!.value - digitsBase) + 26 /// count of lowercase letters range
} else {
return nil
}
}
/// Maps an index to corresponding punycode character
private func punycodeValue(for digit: Int) -> Character? {
guard digit < base else { return nil }
if digit < 26 {
return Character(UnicodeScalar(lettersBase.advanced(by: digit))!)
} else {
return Character(UnicodeScalar(digitsBase.advanced(by: digit - 26))!)
}
}
/// Decodes punycode encoded string to original representation
///
/// - Parameter punycode: Punycode encoding (RFC 3492)
/// - Returns: Decoded string or nil if the input cannot be decoded
public func decodePunycode(_ punycode: Substring) -> String? {
var n: Int = initialN
var i: Int = 0
var bias: Int = initialBias
var output: [Character] = []
var inputPosition = punycode.startIndex
let delimiterPosition: Substring.Index = punycode.lastIndex(of: delimiter) ?? punycode.startIndex
if delimiterPosition > punycode.startIndex {
output.append(contentsOf: punycode[..<delimiterPosition])
inputPosition = punycode.index(after: delimiterPosition)
}
var punycodeInput: Substring = punycode[inputPosition..<punycode.endIndex]
while !punycodeInput.isEmpty {
let oldI: Int = i
var w: Int = 1
var k: Int = base
while true {
let character: Character = punycodeInput.removeFirst()
guard let digit: Int = punycodeIndex(for: character) else {
return nil /// Failing on badly formatted punycode
}
i += digit * w
let t = k <= bias ? tMin : (k >= bias + tMax ? tMax : k - bias)
if digit < t {
break
}
w *= base - t
k += base
}
bias = adaptBias(i - oldI, output.count + 1, oldI == 0)
n += i / (output.count + 1)
i %= (output.count + 1)
guard n >= 0x80, let scalar = UnicodeScalar(n) else {
return nil
}
output.insert(Character(scalar), at: i)
i += 1
}
return String(output)
}
/// Encodes string to punycode (RFC 3492)
///
/// - Parameter input: Input string
/// - Returns: Punycode encoded string
public func encodePunycode(_ input: Substring) -> String? {
var n: Int = initialN
var delta: Int = 0
var bias: Int = initialBias
var output: String = ""
for scalar in input.unicodeScalars {
if scalar.isASCII {
let char = Character(scalar)
output.append(char)
} else if !scalar.isValid {
return nil /// Encountered a scalar out of acceptable range
}
}
var handled: Int = output.count
let basic: Int = handled
if basic > 0 {
output.append(delimiter)
}
while handled < input.unicodeScalars.count {
var minimumCodepoint: Int = 0x10FFFF
for scalar: Unicode.Scalar in input.unicodeScalars {
if scalar.value < minimumCodepoint && scalar.value >= n {
minimumCodepoint = Int(scalar.value)
}
}
delta += (minimumCodepoint - n) * (handled + 1)
n = minimumCodepoint
for scalar: Unicode.Scalar in input.unicodeScalars {
if scalar.value < n {
delta += 1
} else if scalar.value == n {
var q: Int = delta
var k: Int = base
while true {
let t = k <= bias ? tMin : (k >= bias + tMax ? tMax : k - bias)
if q < t {
break
}
guard let character: Character = punycodeValue(for: t + ((q - t) % (base - t))) else { return nil }
output.append(character)
q = (q - t) / (base - t)
k += base
}
guard let character: Character = punycodeValue(for: q) else { return nil }
output.append(character)
bias = adaptBias(delta, handled + 1, handled == basic)
delta = 0
handled += 1
}
}
delta += 1
n += 1
}
return output
}
/// Returns new string containing IDNA-encoded hostname
///
/// - Returns: IDNA encoded hostname or nil if the string can't be encoded
public func encodeIDNA(_ input: Substring) -> String? {
let parts: [Substring] = input.split(separator: ".")
var output: String = ""
for part: Substring in parts {
if output.count > 0 {
output.append(".")
}
if part.rangeOfCharacter(from: CharacterSet.urlHostAllowed.inverted) != nil {
guard let encoded: String = part.lowercased().punycodeEncoded else { return nil }
output += ace + encoded
} else {
output += part
}
}
return output
}
/// Returns new string containing hostname decoded from IDNA representation
///
/// - Returns: Original hostname or nil if the string doesn't contain correct encoding
public func decodedIDNA(_ input: Substring) -> String? {
let parts: [Substring] = input.split(separator: ".")
var output: String = ""
for part: Substring in parts {
if output.count > 0 {
output.append(".")
}
if part.hasPrefix(ace) {
guard let decoded: String = part.dropFirst(ace.count).punycodeDecoded else { return nil }
output += decoded
} else {
output += part
}
}
return output
}
}
MIT License
Copyright (c) 2018 Gumob
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
[![Carthage compatible](https://img.shields.io/badge/Carthage-compatible-4BC51D.svg)](https://github.com/gumob/TLDExtractSwift)
[![Version](http://img.shields.io/cocoapods/v/TLDExtract.svg)](http://cocoadocs.org/docsets/TLDExtract)
[![Platform](http://img.shields.io/cocoapods/p/TLDExtract.svg)](http://cocoadocs.org/docsets/TLDExtract)
[![Build Status](https://travis-ci.com/gumob/TLDExtractSwift.svg?branch=master)](https://travis-ci.com/gumob/TLDExtractSwift)
[![codecov](https://codecov.io/gh/gumob/TLDExtractSwift/branch/master/graph/badge.svg)](https://codecov.io/gh/gumob/TLDExtractSwift)
![Language](https://img.shields.io/badge/Language-Swift%204.2-orange.svg)
![Packagist](https://img.shields.io/packagist/l/doctrine/orm.svg)
# TLDExtract
<code>TLDExtract</code> is a pure Swift library to allows you to get the public suffix of a domain name using [the Public Suffix List](http://www.publicsuffix.org). You can find alternatives for other languages at [publicsuffix.org](https://publicsuffix.org/learn/).<br/>
## What are domains?
Domain names are the unique, human-readable Internet addresses of websites. They are made up of three parts: a top-level domain (a.k.a. TLD), a second-level domain name, and an optional subdomain.
<img src="Metadata/domain-diagram.svg" alt="drawing" width="480" style="width:100%; max-width: 480px;"/>
## Feature
- Extract root domain, top level domain, second level domain, subdomain from url and hostname
- Foundation URL and String support
- IDNA support
- Multi platform support
## Requirements
- iOS 9.3 or later
- macOS 10.12 or later
- tvOS 12.0 or later
- Swift 4.2
- Python 2.7 or Python 3
<small>* No plans to support tvOS 11 or earlier for now</small>
## Installation
### Carthage
Add the following to your `Cartfile` and follow [these instructions](https://github.com/Carthage/Carthage#adding-frameworks-to-an-application).
```
github "gumob/TLDExtractSwift"
```
Do not forget to include Punycode.framework. Otherwise it will fail to build the application.<br/>
<img src="Metadata/carthage-xcode-config.jpg" alt="drawing" width="480" style="width:100%; max-width: 480px;"/>
### CocoaPods
To integrate TLDExtract into your project, add the following to your `Podfile`.
```ruby
platform :ios, '9.3'
use_frameworks!
pod 'TLDExtract'
```
## Usage
### Initialization
Basic initialization code. Exceptions will not be raised unless [the Public Suffix List on the server](https://publicsuffix.org/list/public_suffix_list.dat) is broken.
```swift
import TLDExtract
let extractor = try! TLDExtract()
```
A safer initialization code to avoid errors by using the frozen Public Suffix List:<br/>
```swift
import TLDExtract
let extractor = try! TLDExtract(useFrozenData: true)
```
*The Public Suffix List is updated every time the framework is built. By setting userFrozenData to true, TLDExtract loads data which checked out from the repository.
### Extraction
#### Passing argument as String
Extract an url:
```swift
let urlString: String = "https://www.github.com/gumob/TLDExtract"
guard let result: TLDResult = extractor.parse(urlString) else { return }
print(result.rootDomain) // Optional("github.com")
print(result.topLevelDomain) // Optional("com")
print(result.secondLevelDomain) // Optional("github")
print(result.subDomain) // Optional("www")
```
Extract a hostname:
```swift
let hostname: String = "gumob.com"
guard let result: TLDResult = extractor.parse(hostname) else { return }
print(result.rootDomain) // Optional("gumob.com")
print(result.topLevelDomain) // Optional("com")
print(result.secondLevelDomain) // Optional("gumob")
print(result.subDomain) // nil
```
Extract an unicode hostname:
```swift
let hostname: String = "www.ラーメン.寿司.co.jp"
guard let result: TLDResult = extractor.parse(hostname) else { return }
print(result.rootDomain) // Optional("寿司.co.jp")
print(result.topLevelDomain) // Optional("co.jp")
print(result.secondLevelDomain) // Optional("寿司")
print(result.subDomain) // Optional("www.ラーメン")
```
Extract a punycoded hostname (Same as above):
```swift
let hostname: String = "www.xn--4dkp5a8a.xn--sprr0q.co.jp")"
guard let result: TLDResult = extractor.parse(hostname) else { return }
print(result.rootDomain) // Optional("xn--sprr0q.co.jp")
print(result.topLevelDomain) // Optional("co.jp")
print(result.secondLevelDomain) // Optional("xn--sprr0q")
print(result.subDomain) // Optional("www.xn--4dkp5a8a")
```
#### Passing argument as Foundation URL
Extract an unicode url: <br/>
URL class in Foundation Framework does not support unicode URLs by default. You can use URL extension as a workaround
```swift
let urlString: String = "http://www.ラーメン.寿司.co.jp"
let url: URL = URL(unicodeString: urlString)
guard let result: TLDResult = extractor.parse(url) else { return }
print(result.rootDomain) // Optional("www.ラーメン.寿司.co.jp")
print(result.topLevelDomain) // Optional("co.jp")
print(result.secondLevelDomain) // Optional("寿司")
print(result.subDomain) // Optional("www.ラーメン")
```
Encode an url by passing argument as percent encoded string (Same as above):
```swift
let urlString: String = "http://www.ラーメン.寿司.co.jp".addingPercentEncoding(withAllowedCharacters: .urlQueryAllowed)!
let url: URL = URL(string: urlString)
print(urlString) // http://www.%E3%83%A9%E3%83%BC%E3%83%A1%E3%83%B3.%E5%AF%BF%E5%8F%B8.co.jp
guard let result: TLDResult = extractor.parse(url) else { return }
print(result.rootDomain) // Optional("www.ラーメン.寿司.co.jp")
print(result.topLevelDomain) // Optional("co.jp")
print(result.secondLevelDomain) // Optional("寿司")
print(result.subDomain) // Optional("www.ラーメン")
```
Encode an unicode url by using [`Punycode`](https://github.com/gumob/Punycode) Framework:
```swift
import Punycode
let urlString: String = "http://www.ラーメン.寿司.co.jp".idnaEncoded!
let url: URL = URL(string: urlString)
print(urlString) // http://www.xn--4dkp5a8a.xn--sprr0q.co.jp
guard let result: TLDResult = extractor.parse(url) else { return }
print(result.rootDomain) // Optional("xn--sprr0q.co.jp")
print(result.topLevelDomain) // Optional("co.jp")
print(result.secondLevelDomain) // Optional("xn--sprr0q")
print(result.subDomain) // Optional("www.xn--4dkp5a8a")
```
## Copyright
TLDExtract is released under MIT license, which means you can modify it, redistribute it or use it however you like.
This diff is collapsed.
This diff is collapsed.
//
// Created by kojirof on 2018-11-17.
// Copyright (c) 2018 Gumob. All rights reserved.
//
import Foundation
internal extension Bundle {
class ClassForFramework {
}
static var current: Bundle {
return Bundle.init(for: ClassForFramework.self)
}
}
internal extension String {
var isComment: Bool {
return self.starts(with: "//")
}
}
//
// Created by kojirof on 2018-11-17.
// Copyright (c) 2018 Gumob. All rights reserved.
//
import Foundation
internal struct PSLDataSet {
let exceptions: [PSLData]
let wildcards: [PSLData]
let normals: Set<String>
}
internal struct PSLData {
/// The flag that indicates data is exception
let isException: Bool
/// TLD Parts split by dot
/// e.g. ["*", "yokohama", "jp"]
let tldParts: [PSLDataPart]
/// The priority score to sort the dataset
/// If the hostname matches more than one rule, the one which has the highest priority is prevailing
let priority: Int
init(raw: String) {
self.isException = raw.starts(with: "!")
let tldStr: String = self.isException ? String(raw.dropFirst()) : raw
self.tldParts = tldStr.components(separatedBy: ".").map(PSLDataPart.init)
self.priority = (self.isException ? 1000 : 0) + self.tldParts.count
}
}
extension PSLData {
///
/// For more information about the public suffix list,
/// See the 'Definitions' section at https://publicsuffix.org/list/
///
/// A domain is said to match a rule if and only if all of the following conditions are met:
/// - When the domain and rule are split into corresponding labels,
/// that the domain contains as many or more labels than the rule.
/// - Beginning with the right-most labels of both the domain and the rule,
/// and continuing for all labels in the rule, one finds that for every pair,
/// either they are identical, or that the label from the rule is "*".
///
func matches(hostComponents: [String]) -> Bool {
/// The host must have at least as many components as the TLD
let delta: Int = hostComponents.count - self.tldParts.count
guard delta >= 0 else { return false }
/// Drop extra components from the host components so that two arrays have the same size
let droppedHostComponents = hostComponents.dropFirst(delta)
/// Find the PSLDataPart that matches the host component
let zipped: Zip2Sequence<[PSLDataPart], ArraySlice<String>> = zip(self.tldParts, droppedHostComponents)
return zipped.allSatisfy { (pslData: PSLDataPart, hostComponent: String) in
return pslData.matches(component: hostComponent)
}
}
func parse(hostComponents: [String]) -> TLDResult {
let partsCount: Int = tldParts.count - (self.isException ? 1 : 0)
let delta: Int = hostComponents.count - partsCount
/// Extract the host name to each level domain
let topLevelDomain: String? = delta == 0 ? nil : hostComponents.dropFirst(delta).joined(separator: ".")
let rootDomain: String? = delta == 0 ? nil : hostComponents.dropFirst(delta - 1).joined(separator: ".")
let secondDomain: String? = delta == 0 ? nil : hostComponents[delta - 1]
let subDomain: String? = delta < 2 ? nil : hostComponents.prefix(delta - 1).joined(separator: ".")
return TLDResult(rootDomain: rootDomain,
topLevelDomain: topLevelDomain,
secondLevelDomain: secondDomain,
subDomain: subDomain)
}
}
extension PSLData: Comparable {
static func < (lhs: PSLData, rhs: PSLData) -> Bool {
return lhs.priority < rhs.priority
}
static func == (lhs: PSLData, rhs: PSLData) -> Bool {
return lhs.priority == rhs.priority
}
}
internal enum PSLDataPart {
///
/// For more information about the wildcard character,
/// See the 'Specification' section at https://publicsuffix.org/list/
///
/// The wildcard character * (asterisk) matches any valid sequence of characters in a hostname part.
/// Wildcards are not restricted to appear only in the leftmost position,
/// but they must wildcard an entire label. (I.e. *.*.foo is a valid rule: *bar.foo is not.)
///
case wildcard
case characters(String)
init(component: String) {
self = component == "*" ? .wildcard : .characters(component)
}
func matches(component: String) -> Bool {
switch self {
case .wildcard:
return true
case let .characters(str):
return str == component
}
}
}
//
// Created by kojirof on 2018/11/16.
// Copyright © 2018 Gumob. All rights reserved.
//
import Foundation
internal class PSLParser {
var exceptions: [PSLData] = [PSLData]()
var wildcards: [PSLData] = [PSLData]()
var normals = Set<String>()
internal func parse(data: Data?) throws -> PSLDataSet {
guard let data: Data = data,
let str: String = String(data: data, encoding: .utf8),
str.count > 0 else {
throw TLDExtractError.pslParseError(message: nil)
}
str.components(separatedBy: .newlines).forEach { [weak self] (line: String) in
if line.contains("*") {
self?.wildcards.append(PSLData(raw: line))
} else if line.starts(with: "!") {
self?.exceptions.append(PSLData(raw: line))
} else if !line.isComment && !line.isEmpty {
self?.normals.insert(line)
}
}
return PSLDataSet(
exceptions: exceptions,
wildcards: wildcards,
normals: normals
)
}
}
internal class TLDParser {
private let pslDataSet: PSLDataSet
internal init(dataSet: PSLDataSet) {
self.pslDataSet = dataSet
}
internal func parseExceptionsAndWildcards(host: String) -> TLDResult? {
let hostComponents: [String] = host.lowercased().components(separatedBy: ".")
/// Search exceptions first, then search wildcards if not match
let matchClosure: (PSLData) -> Bool = { $0.matches(hostComponents: hostComponents) }
let pslData: PSLData? = self.pslDataSet.exceptions.first(where: matchClosure) ??
self.pslDataSet.wildcards.first(where: matchClosure)
return pslData?.parse(hostComponents: hostComponents)
}
internal func parseNormals(host: String) -> TLDResult? {
let tldSet: Set<String> = self.pslDataSet.normals
/// Split the hostname to components
let hostComponents = host.lowercased().components(separatedBy: ".")
/// A host must have at least two parts else it's a TLD
guard hostComponents.count >= 2 else { return nil }
/// Iterate from lower level domain and check if the hostname matches a suffix in the dataset
var copiedHostComponents: ArraySlice<String> = ArraySlice(hostComponents)
var topLevelDomain: String?
repeat {
guard !copiedHostComponents.isEmpty else { return nil }
topLevelDomain = copiedHostComponents.joined(separator: ".")
copiedHostComponents = copiedHostComponents.dropFirst()
} while !tldSet.contains(topLevelDomain ?? "")
if topLevelDomain == host { topLevelDomain = nil }
/// Extract the host name to each level domain
let rootDomainRange: Range<Int> = (copiedHostComponents.startIndex - 2)..<hostComponents.endIndex
let rootDomain: String? = rootDomainRange.startIndex >= 0 ? hostComponents[rootDomainRange].joined(separator: ".") : nil
let secondDomainRange: Range<Int> = (rootDomainRange.lowerBound)..<(rootDomainRange.lowerBound + 1)
let secondDomain: String? = secondDomainRange.startIndex >= 0 ? hostComponents[secondDomainRange].joined(separator: ".") : nil
let subDomainRange: Range<Int> = (hostComponents.startIndex)..<(max(secondDomainRange.lowerBound, hostComponents.startIndex))
let subDomain: String? = subDomainRange.endIndex >= 1 ? hostComponents[subDomainRange].joined(separator: ".") : nil
return TLDResult(rootDomain: rootDomain,
topLevelDomain: topLevelDomain,
secondLevelDomain: secondDomain,
subDomain: subDomain)
}
}
//
// TLDExtract.swift
// TLDExtract
//
// Created by kojirof on 2018/11/16.
// Copyright © 2018 Gumob. All rights reserved.
//
import Foundation
public class TLDExtract {
private let tldParser: TLDParser
public init(useFrozenData: Bool = false) throws {
let url: URL = Bundle.current.url(
forResource: useFrozenData ? "public_suffix_list_frozen" : "public_suffix_list",
withExtension: "dat")!
let data: Data = try Data(contentsOf: url)
let dataSet = try PSLParser().parse(data: data)
self.tldParser = TLDParser(dataSet: dataSet)
}
/// Parameters:
/// - host: Hostname to be extracted
/// - quick: If true, parse only normal data excluding exceptions and wildcards
public func parse<T: TLDExtractable>(_ input: T, quick: Bool = false) -> TLDResult? {
guard let host: String = input.hostname else { return nil }
if quick {
return self.tldParser.parseNormals(host: host)
} else {
return self.tldParser.parseExceptionsAndWildcards(host: host) ??
self.tldParser.parseNormals(host: host)
}
}
}
/// Protocol
public protocol TLDExtractable {
var hostname: String? { get }
}
extension URL: TLDExtractable {
init?(unicodeString: String) {
if let encodedUrl: String = unicodeString.addingPercentEncoding(withAllowedCharacters: .urlQueryAllowed) {
self.init(string: encodedUrl)
} else {
self.init(string: unicodeString)
}
}
public var hostname: String? {
let result: String? = self.absoluteString.removingPercentEncoding?.hostname
return result
}
}
extension String: TLDExtractable {
public var hostname: String? {
let schemePattern: String = "^(\\p{L}+:)?//"
let hostPattern: String = "([0-9\\p{L}][0-9\\p{L}-]{1,61}\\.?)? ([\\p{L}-]* [0-9\\p{L}]+) (?!.*:$).*$".replace(" ", "")
if self.matches(schemePattern) {
let components: [String] = self.replace(schemePattern, "").components(separatedBy: "/")
guard let component: String = components.first, !component.isEmpty else { return nil }
return component
} else if self.matches("^\(hostPattern)") {
let components: [String] = self.replace(schemePattern, "").components(separatedBy: "/")
guard let component: String = components.first, !component.isEmpty else { return nil }
return component
} else {
return URL(string: self)?.host
}
}
}
fileprivate extension String {
func matches(_ pattern: String) -> Bool {
guard let regex: NSRegularExpression = try? NSRegularExpression(pattern: pattern) else { return false }
return regex.matches(in: self, range: NSRange(location: 0, length: self.count)).count > 0
}
func replace(_ pattern: String, _ replacement: String) -> String {
return self.replacingOccurrences(of: pattern, with: replacement, options: .regularExpression)
}
}
/// Result
public struct TLDResult {
public let rootDomain: String?
public let topLevelDomain: String?
public let secondLevelDomain: String?
public let subDomain: String?
}
//
// Created by kojirof on 2018-11-17.
// Copyright (c) 2018 Gumob. All rights reserved.
//
import Foundation
enum TLDExtractError: Error {
case pslParseError(message: Error?)
}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment