Autor: Antonio de Jesus Anaya Hernandez, DevOps eng. for the IoPA.
Autor: The internet of Production Alliance, 2023.
Data was collected by "Glyxon labs', as part of the OKW Data Awards program.
The Open Know Where (OKW) Standard is part of the Internet of Production Alliance and its members.
License: CC BY SA
This review is provided as an analysis and recomendations document for the awardees participants of the OKW Data Awards.
import geopandas
import folium
from folium.plugins import HeatMap, MiniMap, FloatImage
import pandas as pd
import os
from datetime import datetime
from scipy.spatial import KDTree
import base64
filename = "threed.geojson"
print('Filename: \t', str(filename))
print('Format: \t', str(filename.split(sep='.')[1]).upper())
print('Modified: \t', str(datetime.fromtimestamp(os.path.getctime(filename)).strftime('%Y-%m-%d %H:%M:%S')))
print('Size: \t\t', str(os.path.getsize(filename)), ' KB')
Filename: threed.geojson Format: GEOJSON Modified: 2023-02-17 17:40:43 Size: 609052 KB
os.environ['PROJ_LIB'] = r'C:\Users\ANAYA\anaconda3\envs\okw_data_awards\Library\share\proj'
data = geopandas.read_file("threed.geojson")
type(data)
geopandas.geodataframe.GeoDataFrame
data
name | styleUrl | icon-opacity | icon-color | icon-scale | icon | description | geometry | |
---|---|---|---|---|---|---|---|---|
0 | Impresión 3D Infinity Makers (ELF Maker) | #icon-1899-9C27B0-nodesc | 1 | #9c27b0 | 1 | https://www.gstatic.com/mapspro/images/stock/5... | None | POINT Z (-99.24064 19.01404 0.00000) |
1 | Impresión 3D y Electrónica. DIAC-3D | #icon-1899-9C27B0-nodesc | 1 | #9c27b0 | 1 | https://www.gstatic.com/mapspro/images/stock/5... | None | POINT Z (-99.24327 18.92853 0.00000) |
2 | IMPRESIÓN 3D Taller de diseño | #icon-1899-9C27B0-nodesc | 1 | #9c27b0 | 1 | https://www.gstatic.com/mapspro/images/stock/5... | None | POINT Z (-99.20824 18.92575 0.00000) |
3 | Jart Studio (sucursal) - Impresión 3D | #icon-1899-9C27B0-nodesc | 1 | #9c27b0 | 1 | https://www.gstatic.com/mapspro/images/stock/5... | None | POINT Z (-99.14459 18.87846 0.00000) |
4 | 3DZone S.A. de C.V. | #icon-1899-9C27B0-nodesc | 1 | #9c27b0 | 1 | https://www.gstatic.com/mapspro/images/stock/5... | None | POINT Z (-99.19289 18.93332 0.00000) |
... | ... | ... | ... | ... | ... | ... | ... | ... |
1633 | Usina Fab Lab | #icon-1899-0288D1 | 1 | #0288d1 | 1 | https://www.gstatic.com/mapspro/images/stock/5... | {'@type': 'html', 'value': 'Facebook, Twitter,... | POINT Z (-51.20268 -30.03210 0.00000) |
1634 | Adoro Robótica Makerspace | #icon-1899-0288D1 | 1 | #0288d1 | 1 | https://www.gstatic.com/mapspro/images/stock/5... | {'@type': 'html', 'value': 'Website works.<br>... | POINT Z (-43.18070 -22.94186 0.00000) |
1635 | MXPCB | #icon-1899-0288D1-nodesc | 1 | #0288d1 | 1 | https://www.gstatic.com/mapspro/images/stock/5... | None | POINT Z (-89.64009 21.04402 0.00000) |
1636 | FabLab Cuiabá-BR | #icon-1899-0288D1 | 1 | #0288d1 | 1 | https://www.gstatic.com/mapspro/images/stock/5... | {'@type': 'html', 'value': 'Facebook and websi... | POINT Z (-47.43440 -23.47281 0.00000) |
1637 | Oficina Maker | #icon-1899-0288D1 | 1 | #0288d1 | 1 | https://www.gstatic.com/mapspro/images/stock/5... | {'@type': 'html', 'value': 'Facebook works.<br... | POINT Z (-63.85365 -8.75318 0.00000) |
1638 rows × 8 columns
data = data.drop(columns=['styleUrl','icon-opacity', 'icon-color', 'icon-scale', 'icon'])
data
name | description | geometry | |
---|---|---|---|
0 | Impresión 3D Infinity Makers (ELF Maker) | None | POINT Z (-99.24064 19.01404 0.00000) |
1 | Impresión 3D y Electrónica. DIAC-3D | None | POINT Z (-99.24327 18.92853 0.00000) |
2 | IMPRESIÓN 3D Taller de diseño | None | POINT Z (-99.20824 18.92575 0.00000) |
3 | Jart Studio (sucursal) - Impresión 3D | None | POINT Z (-99.14459 18.87846 0.00000) |
4 | 3DZone S.A. de C.V. | None | POINT Z (-99.19289 18.93332 0.00000) |
... | ... | ... | ... |
1633 | Usina Fab Lab | {'@type': 'html', 'value': 'Facebook, Twitter,... | POINT Z (-51.20268 -30.03210 0.00000) |
1634 | Adoro Robótica Makerspace | {'@type': 'html', 'value': 'Website works.<br>... | POINT Z (-43.18070 -22.94186 0.00000) |
1635 | MXPCB | None | POINT Z (-89.64009 21.04402 0.00000) |
1636 | FabLab Cuiabá-BR | {'@type': 'html', 'value': 'Facebook and websi... | POINT Z (-47.43440 -23.47281 0.00000) |
1637 | Oficina Maker | {'@type': 'html', 'value': 'Facebook works.<br... | POINT Z (-63.85365 -8.75318 0.00000) |
1638 rows × 3 columns
unique_desc = list(data['description'].dropna())
print(*unique_desc, sep='\n')
print('Unique data rows: ', unique_desc.__len__())
{'@type': 'html', 'value': 'Website, Facebook, Twitter, Instagram work. Offers training.<br><br>https://twitter.com/3dapplications<br>https://www.facebook.com/3DApplicationsBR<br>https://www.instagram.com/3dapplications/'} {'@type': 'html', 'value': 'Facebook, Twitter, and website work.<br><br>http://www.facebook.com/3dfila<br>https://twitter.com/3DFila_Brasil<br>https://3dfila.com.br/'} {'@type': 'html', 'value': "Website doesn't work, Facebook doesn't work, full address:\xa0 Lai Lai Center, 1º Andar, Loja 105<br>Centro<br>Alto Parana 7000<br>Paraguai"} {'@type': 'html', 'value': "Website doesn't work"} Websites work Website, Twitter, Facebook work. Does biomedical stuff. Website works. https://www.printerize3d-scv.com.br/ {'@type': 'html', 'value': "Website doesn't work, Facebook works, LinkedIn works. Offers training."} {'@type': 'html', 'value': "Website and Instagram don't work, Facebook works.\xa0"} {'@type': 'html', 'value': "Facebook doesn't work. Website works."} {'@type': 'html', 'value': "Facebook and Twitter work. Instagram doesn't work."} {'@type': 'html', 'value': 'Website works.<br><br>http://www.impressao3dfacil.com.br'} {'@type': 'html', 'value': 'Facebook, Twitter, and website work.<br><br>https://www.facebook.com/oaloobr<br>https://twitter.com/oaloobr<br>https://www.oaloo.com.br/'} {'@type': 'html', 'value': 'Facebook, Instagram, and website work.<br><br>https://www.facebook.com/PRINTITTRESD/<br>https://www.instagram.com/printit_3d/<br>https://www.printit3d.com.br/'} {'@type': 'html', 'value': 'Facebook, Twitter, and website work.<br><br>https://www.facebook.com/printgreen3d<br>https://twitter.com/printgreen3d<br>http://www.printgreen3d.com.br/'} {'@type': 'html', 'value': 'Facebook, Instagram, and website work.<br><br>https://www.facebook.com/r3dyoficial<br>https://www.instagram.com/r3dyoficial<br>https://www.r3dy.com.br'} {'@type': 'html', 'value': 'Facebook and website work.<br><br>https://www.facebook.com/prototipagemImpressao3D<br>http://www.rgimpressao3d.com.br'} {'@type': 'html', 'value': "Website doesn't work.<br><br>http://tdtec.com.br/"} {'@type': 'html', 'value': 'Facebook, Twitter, and website work.<br><br>https://www.facebook.com/usinafablab/<br>https://twitter.com/UsinaFablab<br>https://www.usinafablab.com.br/'} {'@type': 'html', 'value': 'Website works.<br><br>http://www.adororobotica.com'} {'@type': 'html', 'value': 'Facebook and website work.<br><br>https://www.facebook.com/fablabcba?ref=hl<br>https://www.fablabs.io/labs/fablabcuiaba'} {'@type': 'html', 'value': 'Facebook works.<br><br>https://www.facebook.com/oficinamakerpvh/'} Unique data rows: 22
data = data.drop(columns=['description'])
report = {'okw_columns': list(data.columns),}
data
name | geometry | |
---|---|---|
0 | Impresión 3D Infinity Makers (ELF Maker) | POINT Z (-99.24064 19.01404 0.00000) |
1 | Impresión 3D y Electrónica. DIAC-3D | POINT Z (-99.24327 18.92853 0.00000) |
2 | IMPRESIÓN 3D Taller de diseño | POINT Z (-99.20824 18.92575 0.00000) |
3 | Jart Studio (sucursal) - Impresión 3D | POINT Z (-99.14459 18.87846 0.00000) |
4 | 3DZone S.A. de C.V. | POINT Z (-99.19289 18.93332 0.00000) |
... | ... | ... |
1633 | Usina Fab Lab | POINT Z (-51.20268 -30.03210 0.00000) |
1634 | Adoro Robótica Makerspace | POINT Z (-43.18070 -22.94186 0.00000) |
1635 | MXPCB | POINT Z (-89.64009 21.04402 0.00000) |
1636 | FabLab Cuiabá-BR | POINT Z (-47.43440 -23.47281 0.00000) |
1637 | Oficina Maker | POINT Z (-63.85365 -8.75318 0.00000) |
1638 rows × 2 columns
data.name.unique().shape
(1132,)
data.geometry.unique().shape
(1178,)
data = data.loc[data.drop_duplicates(subset=['geometry']).index]
import nltk
nltk.download('stopwords')
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
from nltk.corpus import stopwords
from nltk.stem.snowball import SnowballStemmer
[nltk_data] Downloading package stopwords to [nltk_data] C:\Users\ANAYA\AppData\Roaming\nltk_data... [nltk_data] Package stopwords is already up-to-date!
# preprocess text data
data['cl_name'] = data['name'].str.lower().str.replace('[^\w\s]','', regex=True).str.strip()
stop_words = set(stopwords.words('spanish'))
data['cl_name'] = data['cl_name'].apply(lambda x: ' '.join([word for word in x.split() if word not in stop_words]))
stemmer = SnowballStemmer("spanish")
data['cl_name'] = data['cl_name'].apply(lambda x: ' '.join([stemmer.stem(word) for word in x.split()]))
# vectorize text data using TF-IDF
vectorizer = TfidfVectorizer()
vec_ftrans = vectorizer.fit_transform(data['cl_name'])
# cluster common words using k-means
kmeans = KMeans(n_clusters=5, random_state=0, n_init=5)
kmeans.fit(vec_ftrans)
# add cluster labels to dataframe
data['cluster'] = kmeans.labels_
cluster_names = [str([k for k,v in vectorizer.vocabulary_.items() if v == center.argsort()[-1]][0]) for i, center in enumerate(kmeans.cluster_centers_)]
import seaborn as sns
cluster_counts = data.drop(columns='geometry').groupby('cluster').size().reset_index(name='count')
# Sort by frequency
cluster_counts = cluster_counts.sort_values('count', ascending=False)
# Plot frequency of clusters
sns.set_style('ticks')
sns.barplot(x='cluster', y='count', data=cluster_counts)
<AxesSubplot: xlabel='cluster', ylabel='count'>
cluster_counts
cluster | count | |
---|---|---|
4 | 4 | 719 |
2 | 2 | 278 |
1 | 1 | 115 |
0 | 0 | 52 |
3 | 3 | 14 |
geocodes = pd.read_csv('rg_cities1000.csv')
# Create a KDTree from the lat-lon coordinates in the geocodes DataFrame
tree = KDTree(geocodes[['lat', 'lon']])
def get_country_code(latlong):
lat, lon = latlong
_, idx = tree.query([lat, lon])
return geocodes.iloc[idx]['cc']
def get_city(latlong):
lat, lon = latlong
_, idx = tree.query([lat, lon])
return geocodes.iloc[idx]['name']
%%time
data['country'] = list(map(get_country_code, data['geometry'].apply(lambda geom: (geom.y, geom.x))))
CPU times: total: 15.6 ms Wall time: 76.4 ms
%%time
data['city'] = list(map(get_city, data['geometry'].apply(lambda geom: (geom.y, geom.x))))
CPU times: total: 15.6 ms Wall time: 76.4 ms
data.drop(columns='geometry').groupby('country').size().sort_values(ascending=False)
country MX 601 BR 204 AR 125 CO 48 CL 37 BO 19 EC 19 PE 13 DO 13 PY 11 VE 11 HN 11 GT 10 PR 10 JM 10 CR 9 SV 8 UY 8 PA 4 NI 3 BZ 2 US 2 dtype: int64
sns.countplot(y='country', data=data.drop(columns='geometry'), order=data['country'].value_counts().index)
<AxesSubplot: xlabel='count', ylabel='country'>
data.drop(columns='geometry').groupby('city').size().sort_values(ascending=False)
city La Paz 19 Cancun 19 Hermosillo 19 Tuxtla Gutierrez 18 Xalapa de Enriquez 17 .. Ejido Javier Rojo Gomez 1 Duque de Caxias 1 Paulinia 1 Duitama 1 Zaragoza 1 Length: 420, dtype: int64
top_cities = data['city'].value_counts().nlargest(20).index
sns.countplot(y='city', data=data[data['city'].isin(top_cities)].drop(columns='geometry'), order=top_cities)
<AxesSubplot: xlabel='count', ylabel='city'>
#### point_max = (data.geometry.y.max(), data.geometry.x.max())
point_min = (data.geometry.y.min(), data.geometry.x.min())
main_map = folium.Map(zoom_start=2)
main_map.fit_bounds((point_min, point_max))
marker_layer = folium.FeatureGroup(name='Markers', show=False)
popup = folium.features.GeoJsonPopup(
fields=['name', 'country'],
aliases=['Name:', 'Country:'],
localize=True,
sticky=False,
labels=True,
style="font-size: 12px;",
)
geojson = folium.GeoJson(
data=data,
popup=popup,
marker=folium.Marker(icon=folium.features.Icon()),
).add_to(marker_layer)
main_map.add_child(marker_layer)
heatmap_data = [[row['geometry'].y, row['geometry'].x] for index, row in data.iterrows()]
heatmap_layer = folium.FeatureGroup(name='3D print')
heatmap_layer.add_child(HeatMap(heatmap_data, opacity=0.1, radius=8))
main_map.add_child(heatmap_layer)
logo_url = 'iopa_logo_okw.png'
logo_size = (10, 10)
icon = folium.features.CustomIcon(logo_url, icon_size=logo_size)
span = 1
float_image = FloatImage(logo_url, bottom=span, left=span, width=logo_size[0], height=logo_size[1])
main_map.add_child(float_image)
minimap = MiniMap()
main_map.add_child(minimap)
folium.TileLayer('openstreetmap').add_to(main_map)
map_control = folium.LayerControl(name='Base Maps', collapsed=True)
main_map.add_child(map_control)
# main_map
1. Usage: Click on top-right corner click selector to switch on/off the interactive Markers.
report['cities_top_20'] = data['city'].value_counts().nlargest(20)
report['countries_top_5'] = data['country'].value_counts().nlargest(5)
def list_to_markdown(data):
z_ = list(zip(data.index, data.values))
locations = ['\t\t . {}'.format(j) for j in z_]
return Markdown('\n'.join(locations).replace('(','').replace(')','').replace('[','').replace(']','').replace(',', ''))
from IPython.display import Markdown
counts = [val[1] for val in reversed(cluster_counts.values)]
cluster_zip = [str(x[0]) + ': ' + str(x[1]) for x in list(zip(cluster_names, counts))]
display(Markdown(f'''
## Findings:
1. Columns that are related to the OKW standard are:
a. {report['okw_columns']}
2. Completeness compared to the OKW simplified data schema:
a. Total database schema field numbers: 17,
b. Percentage of covered: {100 * int(report['okw_columns'].__len__()) // 17 } %.
3. Reverse keyword analysis (reverse search) in 'Spanish' language had the results:
a. {cluster_zip}
'''))
display(Markdown(f'''
4. Total unique locations:
a. {data.shape[0]}.
5. Verified locations:
a. Unverified locations. No references provided.
b. An unknown number of locations may appear in Google Places/Maps.
6. The reverse geolocation had the results of locations by top 5 countries and top 20 cities:
'''))
Markdown(f'''
{display(list_to_markdown(report['countries_top_5']))}
{display(list_to_markdown(report['cities_top_20']))}
''')
display(Markdown(f'''
7. Origin of data:
a. Not raw data or origin stated in the files. Not aditional documentation or methods provided.
8. Possible origin of data based on inspected data:
a. Possibility of Google Places API. Query for '3D printing' or 'Impresion 3D', radio 1000 m. For a specified list of locations.
b. Other sources may be part of the data points for example the file provided: ['Mapping_stats Brazil fazedores.xslx']
9. Observations:
a. The data provided contains data points without references, collection methods or verified sources.
10. Quality:
a. Evaluation results: Low, based on points 1, 2, 5 and 7.
'''))
1. Columns that are related to the OKW standard are:
a. ['name', 'geometry']
2. Completeness compared to the OKW simplified data schema:
a. Total database schema field numbers: 17,
b. Percentage of covered: 11 %.
3. Reverse keyword analysis (reverse search) in 'Spanish' language had the results:
a. ['print: 14', 'las: 52', 'impresion: 115', 'printing: 278', '3d: 719']
4. Total unique locations:
a. 1178.
5. Verified locations:
a. Unverified locations. No references provided.
b. An unknown number of locations may appear in Google Places/Maps.
6. The reverse geolocation had the results of locations by top 5 countries and top 20 cities:
. 'MX' 601
. 'BR' 204
. 'AR' 125
. 'CO' 48
. 'CL' 37
. 'Hermosillo' 19
. 'La Paz' 19
. 'Cancun' 19
. 'Tuxtla Gutierrez' 18
. 'Xalapa de Enriquez' 17
. 'Merida' 16
. 'Belo Horizonte' 16
. 'Veracruz' 14
. 'Aguascalientes' 13
. 'Brasilia' 13
. 'San Luis Potosi' 12
. 'Puebla' 12
. 'Playa del Carmen' 12
. 'Morelia' 12
. 'Santa Fe de la Vera Cruz' 11
. 'Chihuahua' 11
. 'Campeche' 11
. 'Cordoba' 10
. 'Bahia Blanca' 10
. 'San Luis' 10
7. Origin of data:
a. Not raw data or origin stated in the files. Not aditional documentation or methods provided.
8. Possible origin of data based on inspected data:
a. Possibility of Google Places API. Query for '3D printing' or 'Impresion 3D', radio 1000 m. For a specified list of locations.
b. Other sources may be part of the data points for example the file provided: ['Mapping_stats Brazil fazedores.xslx']
9. Observations:
a. The data provided contains data points without references, collection methods or verified sources.
10. Quality:
a. Evaluation results: Low, based on points 1, 2, 5 and 7.
1. Review, the purposes of the Data Awards data collection, and the OKW standard.
2. Provide this list of resources:
a. Data origin, raw data or reference used datasets.
b. Filtering, and verification methods, computational or qualitative.
c. Findings and analysis based on verifiable data.
d. Collection methods.