We will be using Twitter to match input keywords with other relevant keywords. The benefit of this is that since humans are smart, they have already solved the problem of "relevancy" for us.

This project is solving the Data Treasure Hunting challenge.


The problem facing data mining is that it's difficult to find relevant data given a keyword due to badly tagged data. We aim to solve this using the Twitter API. Humans have already tagged their tweets with relevant hashtags (that are more often than not related to each other). With this, we search for tweets containing the input keyword as a hashtag, then look at all the other hashtags as "relevant" keywords. We then look at the amount of occurrences, and then average them to set a threshold. Any keywords over that threshold are output in a data visualization using D3 (hopefully).

Project Information

License: GNU General Public License version 2.0 (GPL-2.0)

Source Code/Project URL:



  • Nahom Beyene
  • Ritwik Gupta
  • Andrew Max