opendatasocialgoldminer

Open Data Gold Digger (formerly Open Data Social Gold Miner) This project is focused on taking the conglomerates of raw data and mining this data into smaller streams of useful, relevant data. We have done this by utilizing the Project Open Data Metadata Schema v1.1 in analyzing and creating keywords from existing data sets, provided by the United States government.

This project is solving the Data Treasure Hunting challenge.

Description

Open Data Gold Digger

Formerly known as Open Data Social Gold Miner

This project is focused on taking the conglomerates of raw data and mining this data into smaller streams of useful, relevant data. We have done this by utilizing the Project Open Data Metadata Schema v1.1 in analyzing and creating keywords from existing data sets, provided by the United States government.

Keywords

Our code is written in Python. It imports the text from web pages and JSON, and then performs a word count and then outputs those results of the most used keywords. A link to our Github repository can be found at the bottom of the page.

Problem Data

The greatest problem in overcoming this challenge was in access to relevant data. Further development of this project may include implementation of some of the following: -- Web Scraper does the following; a) imports text, b)imports html meta tags; such as, meta keywords, meta title, meta description, c)imports text from documents(pdf, doc, excel, etc). -- Utilization of Bing Search API as a method of crawling government data sites. -- Utilization of Bing Synonym API as a method of finding matching keywords in separate data sets. -- Utilization of Stanford University's NLP Semantics API, with the purpose of: a)process the semantics of a sentence b)identify useful keyword phrases -- Implementation of a web application that is geared towards supplying data structured for a specific user type, with the user type being associated with the level of understanding within a specific domain.


Project Information


License: NASA Open Source Agreement 1.3 (NASA-1.3)


Source Code/Project URL: https://github.com/mikestratton/dataTreasureHunting


Resources


Starter Kit Download - http://data.nasa.gov/docs/spaceapps/challenges/datatreasurehunting/nasa_data_treasure_hunt_toolkit.xlsx
NASA Data - Developer Resources - https://data.nasa.gov/developer
Convert XLSX Spreadsheet to JSON - http://oss.sheetjs.com/js-xlsx/

Team

  • Brian Grady
  • Matthew Wadsworth
  • Michael Stratton


Loading...
×
Loading...
×