The challenge is to develop a tool provides a deep insight on a data source by identifying its information assets, so it can be used to measure the relevance between these assets and the search keywords used to explore such data source.

This project is solving the Data Treasure Hunting challenge.


Our team will propose a Statistical Machine Learning model to learn the semantics of the text in an unsupervised way. We will use the Word Vector representation model to represent the text, then apply clustering techniques to identify the relevance between words.

Project Information

License: Educational Community License, Version 2.0 (ECL-2.0)

Source Code/Project URL:



  • Ahmed sa3d
  • Amal Abd E-Salam Mahmoud
  • Yassin Khalifa
  • nada osman
  • Karim Magdi