opendatasocialgoldminer
Open Data Gold Digger (formerly Open Data Social Gold Miner) This project is focused on taking the conglomerates of raw data and mining this data into smaller streams of useful, relevant data. We have done this by utilizing the Project Open Data Metadata Schema v1.1 in analyzing and creating keywords from existing data sets, provided by the United States government.This project is solving the Data Treasure Hunting challenge. Description
Open Data Gold Digger
Formerly known as Open Data Social Gold Miner
This project is focused on taking the conglomerates of raw data and mining this data into smaller streams of useful, relevant data. We have done this by utilizing the Project Open Data Metadata Schema v1.1 in analyzing and creating keywords from existing data sets, provided by the United States government.
Keywords
Our code is written in Python. It imports the text from web pages and JSON, and then performs a word count and then outputs those results of the most used keywords. A link to our Github repository can be found at the bottom of the page.
Problem Data
The greatest problem in overcoming this challenge was in access to relevant data. Further development of this project may include implementation of some of the following: -- Web Scraper does the following; a) imports text, b)imports html meta tags; such as, meta keywords, meta title, meta description, c)imports text from documents(pdf, doc, excel, etc). -- Utilization of Bing Search API as a method of crawling government data sites. -- Utilization of Bing Synonym API as a method of finding matching keywords in separate data sets. -- Utilization of Stanford University's NLP Semantics API, with the purpose of: a)process the semantics of a sentence b)identify useful keyword phrases -- Implementation of a web application that is geared towards supplying data structured for a specific user type, with the user type being associated with the level of understanding within a specific domain.
Project Information
License: NASA Open Source Agreement 1.3 (NASA-1.3)
Source Code/Project URL: https://github.com/mikestratton/dataTreasureHunting
Resources
Starter Kit Download - http://data.nasa.gov/docs/spaceapps/challenges/datatreasurehunting/nasa_data_treasure_hunt_toolkit.xlsx
NASA Data - Developer Resources - https://data.nasa.gov/developer
Convert XLSX Spreadsheet to JSON - http://oss.sheetjs.com/js-xlsx/