We created Marvin, an semantic text annotation tool that uses external resources such as DBPedia and WordNet to annotate semantically text. Marvin is built in java and can be used as a standalone application or as a library.

Basically, Marvin queries the sources he can for the definitions about words in inputed text. He uses some NLP transformations in order to obtain as much knowledge as possible. For DBPedia he uses unigram and bigrams, while wordnet queries only with single words.

Probably next release will include some new data sources such as MetaMap for biomedical tagging, possibly SNOMED, ICD11 and some ontologies. Also, improvements will be made on querying current data sources.

Since the project is developed as part of my PhD and my PhD is on biomedica text mining, there is a huge possibility that apart from DBPedia and WordNet, Marvin will focus mainly on biomedical data. However, we will tend to make it as simple as possible, and thats why for UMLS  we will be using MetaMap. However, we will try to make it as independant as possible from installing all resources. Currently user needs to install WordNet on his machine.


