Hierarchies
- class addrmatcher.hierarchy.GeoHierarchy(country, name, coordinate_boundary=None)
Bases:
object
The GeoHierarchy class represents the structure of a country’s region/area; for instance, a state or a province is the sub-region of a country.
- Attributes
coordinate_boundary
Return the list of coordinates as the boundary of a country.
name
Return the name of the country
types
Return the dictionary that contains all the defined regional structures.
Methods
add_region
(region, parent_region)Add a region as a child/sub region of another region
add_type
(region, type_id[, type_name])Add a new geographical hierarchy type, for instace: statistical area, administrative level
get_regions_by_name
([region_names, ...])Get all the relevant regions from the hierarchy based on the given parameters
Get the smallest regional unit
set_coordinate_boundary
(min_latitude, ...)Set or modify the coordinate boundaries of a country.
- add_region(region, parent_region)
Add a region as a child/sub region of another region
- Parameters
- region:Region
The sub region to be added as a child of the parent region
- parent_region:Region
The direct upper-level of the region
Examples
>>> country = Region("Country") >>> state = Region("State",col_name="STATE") >>> australia = GeoHierarchy(country,"Australia") >>> australia.add_region(region=state, parent_region=country)
- add_type(region, type_id, type_name='')
Add a new geographical hierarchy type, for instace: statistical area, administrative level
- Parameters
- region:Region
The smallest root region for the hierarchy. The common upper-level region name, shared with other types of hierarchies, can’t be assigned as a root. For instance, administrative level and statistical area use Country or State/Province as their upper regional level
- type_id:string
The unique identifier for the structural type
- type_name:string
The name of the regional structure type
Examples
>>> country = Region("Country") >>> state = Region("State",col_name="STATE") >>> sa4 = Region("Statistica Area 4",short_name="SA4",col_name="SA4") >>> australia = GeoHierarchy(country,"Australia") >>> australia.add_region(region=state, parent_region=country) >>> australia.add_region(region=sa4, parent_region=state) >>> australia.add_type(sa4,"ASGS","Australian Statistical Geography Standard") >>> australia.type {'ASGS': 'Australian Statistical Geography Standard'}
- property coordinate_boundary
Return the list of coordinates as the boundary of a country. The format is [minimum latitude, maximum latitude, minimum longitude, maximum longitude]
- Returns
- string
The coordinate boundary of a country
Examples
>>> australia = GeoHierarchy(country,"Australia") >>> australia.set_coordinate_boundary(-43.58301104, -9.23000371, 96.82159219, 167.99384663) >>> australia.coordinate_boundary [-43.58301104, -9.23000371, 96.82159219, 167.99384663]
- get_regions_by_name(region_names=None, operator=None, attribute=None)
Get all the relevant regions from the hierarchy based on the given parameters
- Parameters
- region_names:string or list
fill the name or short name or list of names or short names of the regions in relations to operator parameter above. If no region names provided, the function will return all regions in the hierarchy.
- operatorOperator
use the operator to find all the upper/lower level regions from a particular region name. For instance: Country > State (Country gt State).
Use the ‘gt’ operator to search for the upper level of State
- attribute:string
the region’s attribute name that will be saved into the list (name, short_name, or col_name) if it’s empty, then the list will store the object of the region
- Returns
- list
list of regions or region’s attribute. The function will return an empty list if there corresponding regions with the given name or short name are found.
Examples
>>> country = Region("Country") >>> state = Region("State",col_name="STATE") >>> sa4 = Region("Statistica Area 4",short_name="SA4",col_name="SA4") >>> australia = GeoHierarchy(country,"Australia") >>> australia.add_region(region=state, parent_region=country) >>> australia.add_region(region=sa4, parent_region=state) >>> regions = australia.get_regions_by_name() >>> for region in regions: >>> print(region.name) Country State Statistica Area 4 >>> regions = australia.get_regions_by_name(region_names='State', operator=le,) >>> for region in regions: >>> print(region.name) State Statistica Area 4 >>> col_names = australia.get_regions_by_name(region_names=['State','SA4'], attribute='col_name') >>> for col_name in col_names: >>> print(col_name) STATE SA4
- get_smallest_region_boundaries()
Get the smallest regional unit
- Returns
- Region
the smallest regional unit
Examples
>>> country = Region("Country") >>> state = Region("State",col_name="STATE") >>> sa4 = Region("Statistica Area 4",short_name="SA4",col_name="SA4") >>> australia = GeoHierarchy(country,"Australia") >>> australia.add_region(region=state, parent_region=country) >>> australia.add_region(region=sa4, parent_region=state) >>> australia.get_smallest_region_boundaries().name Statistica Area 4
- property name
Return the name of the country
- Returns
- string
The name of the country
Examples
The country’s name can be set initially when calling the constructor. >>> australia = GeoHierarchy(country,”Australia”) >>> australia.name ‘Australia’
- set_coordinate_boundary(min_latitude, max_latitude, min_longitude, max_longitude)
Set or modify the coordinate boundaries of a country.
- Parameters
- value:list
The coordinate boundary of a country. The format of the input is : [minimum latitude, maximum latitude, minimum longitude, maximum longitude]
Examples
>>> australia = GeoHierarchy(country,"Australia") >>> australia.set_coordinate_boundary(-43.58301104, -9.23000371, 96.82159219, 167.99384663) >>> australia.coordinate_boundary [-43.58301104, -9.23000371, 96.82159219, 167.99384663]
- property types
Return the dictionary that contains all the defined regional structures.
- Returns
- dictionary
The defined regional structures
Examples
>>> country = Region("Country") >>> state = Region("State",col_name="STATE") >>> sa4 = Region("Statistica Area 4",short_name="SA4",col_name="SA4") >>> australia = GeoHierarchy(country,"Australia") >>> australia.add_region(region=state, parent_region=country) >>> australia.add_region(region=sa4, parent_region=state) >>> australia.add_type(sa4,"ASGS","Australian Statistical Geography Standard") >>> australia.type {'ASGS': 'Australian Statistical Geography Standard'}
Matcher
- class addrmatcher.matcher.DistanceMethod(value)
Bases:
enum.Enum
An enumeration.
- JARO = 2
- JARO_WINKLER = 3
- LEVENSHTEIN = 1
- class addrmatcher.matcher.GeoMatcher(hierarchy, file_location='')
Bases:
object
Methods
get_region_by_address
(address[, ...])perform address based matching and return the corresponding region e.g.
get_region_by_coordinates
(lat, lon[, n, km, ...])perform coordinate_based matching and return the corresponding regions in a dictionary e.g.
- get_region_by_address(address, similarity_threshold=0.9, nlargest=1, regions=None, operator=None, address_cleaning=False, method=DistanceMethod.LEVENSHTEIN)
perform address based matching and return the corresponding region e.g. administrative level or statistical area
- Parameters
- address:string
The complete physical address
- similarity_threshold:float
The minimum similarity ratio ranges between 0 and 1 (default = 0.9)
- nlargest:int
The number of the addresses to be returned by the function. If nlargest = 1, then the function will return the top similarity only (default = 1)
- regions:string or list of string
Specify the name or list of names of the regions to be returned by the function
- operator: Operator
use the operator (Operator.ge or Operator.le) to find all the upper/lower level regions from a particular region name. For instance: Country >= State (Country ge State).
Use the ‘ge’ operator to search for the upper level of State (and itself)
- address_cleaning:boolean
whether to perform data cleansing on the address, for instance: revise invalid suburb name (currently, only applied to Australian addresses. Set this parameter as True for non-Australian addresses could return an error)
- method:string
The name of the edit distance algorithm used. Select one of DistanceMethod.LEVENSHTEIN,DistanceMethod.JARO, or DistanceMethod.JARO_WINKLER
- Returns
- Dictionary
the dictionary of the matched adddresses. The keys of the dictionary are based on the column name defined in the Hierarchy object used. By default, the function will return only the top similarity record (nlargest = 1) as long as its similarity is larger than the threshold ratio. If no addresses have a similarity ratio more than the threshold, the function will return an empty dictionary.
Examples
>>> matcher = GeoMatcher(AUS) >>> matched = matcher.get_region_by_address("2885 Darnley Street, Braybrookt, VIC 3019", similarity_threshold = 0.95) >>> matched {'MB_CODE_2016': ['20375120000'], 'SA4_NAME_2016': ['Melbourne - West'], 'SA3_NAME_2016': ['Maribyrnong'], 'SA2_NAME_2016': ['Braybrook'] 'SA1_7DIGITCODE_2016': ['2134703'], 'STATE': ['VIC'], 'RATIO': [0.9841269841269842], 'SSC_NAME_2016': ['Braybrook'], 'LGA_NAME_2016': ['Maribyrnong (C)'], 'FULL_ADDRESS': ['2885 DARNLEY STREET BRAYBROOK VIC 3019']}
- get_region_by_coordinates(lat, lon, n=1, km=1, regions=None, operator=None)
perform coordinate_based matching and return the corresponding regions in a dictionary e.g. administrative level or statistical area
- Parameters
- lat:float
latitude
- lon:float
longitude
- n:integer
the number of nearest addresses to be returned by the function.
- km:integer
the nearest addresses will be searched from the input coordinates point within the argument kilometer radius
- Returns
- Dictionary
a dictionary of addresses with statistical and administrative regions. By default, the function will return the record with the smallest distance only (n = 1). If no addresses found within the radius (km), the function will return an empty dictionary.
Examples
>>> matcher = GeoMatcher(AUS) >>> matched = matcher.get_region_by_address(-26.657299,153.094955) >>> matched {'FULL_ADDRESS': ['8 32 SECOND AVENUE MAROOCHYDORE QLD 4558'], 'LATITUDE': [-26.6572865955204], 'LONGITUDE': [153.09496396875], 'LGA_NAME_2016': ['Sunshine Coast (R)'], 'SSC_NAME_2016': ['Maroochydore'], 'SA4_NAME_2016': ['Sunshine Coast'], 'SA3_NAME_2016': ['Maroochy'], 'SA2_NAME_2016': ['Maroochydore - Kuluin'], 'SA1_7DIGITCODE_2016': ['3142707'], 'MB_CODE_2016': ['30563074700'], 'DISTANCE': [0.0016422183328786543]}
Region
Data structure for the regional unit
- class addrmatcher.region.Region(name: str, short_name: str = '', col_name: str = '')
Bases:
object
The Region class represents a unit of area in the country. Each region has a unique name and a corresponding column in the reference dataset.
- Parameters
- name: string
The name of the area unit
- short_name: string
The short name of the area unit
- col_name: string
The column name of the area unit
Examples
The area’s column name can be set initially when calling the constructor. >>> sa2 = Region(‘Statistical Area 2’,short_name=’SA2’,col_name=’SA2’) >>> sa2.col_name ‘SA2’
- col_name: str = ''
- name: str
- short_name: str = ''
Resource
- addrmatcher.resource.create_url(url)
Produce a URL that is compatible with Github’s REST API from the input url.This can handle blob or tree paths.
- Parameters
- urlstr
url to the data directory in Github repository
- Returns
- str
Github API url
- str
Download directory
- addrmatcher.resource.download()
Trigger the download_data function and read the argument from user’s command line interface.
- addrmatcher.resource.download_data(country='Australia', output_dir='/home/docs/checkouts/readthedocs.org/user_builds/addrmatcher/checkouts/latest/docs/source')
Download the files in directories and sub-directories in repo_url.
- Parameters
- countrystr
country name which will be sub-directory name example - data/Australia/.
- Returns
- int
number of total files downloaded
- addrmatcher.resource.print_text(text: str, color: str = 'default', in_place: bool = False, **kwargs: any) None
Print text to console.
- Parameters
- textstr
text to print
- colorstr
it can be one of “red” or “green”, or “default”
- in_placebool
whether to erase previous line and print in place
- **kwargsdict, optional
: other keywords passed to built-in print