natural language processing

0

[New Paper] Information extraction from tables in literature

About two months ago, a paper that resulted from my Ph.D. work has been published in the International Journal of Document Analysis and Recognition. The paper is titled “A framework for information extraction from tables in biomedical literature”.

0

Building Named Entity Recognizer (NER) using Conditional Random Fields (CRF)

Named entity recognizer is a program that recognizes named entity in text. The named entities could be anything from locations, company or person’s names, drug or disease names, etc.

0

Ideas for the future

1

Awarded best paper award on NLDB 2018 conference

A paper called “Classification of Intangible Social Innovation Concepts” that was submitted and accepted for presentation at 23rd International Conference on Natural Language & Information Systems (NLDB2018) and was held in Paris, France from 13th to 15th June 2018, received one of the best paper award. In total 3 papers were awarded as the best papers with no ranking or order between them. Papers also received monetary award.

NLDB is quite established (organised already for 23 years) and good conference in the area of natural language processing. Usually about 15-18% of papers submitted are accepted as long papers. It seems like some more papers are accepted as short papers and poster presentations, so the percentages of accepted papers is higher, but

0

Impressions from HealTAC2018 conference

On 18th and 19th April 2018, the first UK health text analytics conference (HealTAC) took place in Manchester.  The main conference venue was Pendulum hotel, located on Stackville street, close to the north University of Manchester campus, the former UMIST. I had a pleasure to participate and help with a certain organisational things, as a member of local organisation committee member.

On the first day of the conference, people started arriving between 8:30-9:00 for registration. During the registration people could have some coffee and pastry for breakfast. People having posters were directed to the poster room, so they can already hang their posters. The conference started at 9:00, with welcome speech, some health and safety procedures, which was followed

0

Moment when my idea became a web standard

This is the story how one schema I worked on as a side project suddenly found its place in W3C recommendation.

In November 2015, I went with my supervisor to Japan. In small cities of Mishima and Ito, about 1 hour train ride from Tokyo was held Biomedical Linked Annotation Hackathon (BLAH2) to which my supervisor was invited. He could not stay for the whole period, so he offered me to go, which I accepted. The event was organised by Japanese Database Center for Life Sciences (DBCLS).

On the first day was the conference, where people were presenting their work mainly on annotating biomedical literature. My PhD was related, kind of similar topic, it was about information extraction from tables

0

Marvin – A tool for semantic annotation released

During the last week I have released a version of Marvin – a tool for semantic annotations, that is able to annotate text using various sources, such as UMLS (using MetaMap), DBPedia, using some SPARQL interface, WordNet and probably most importantly SKOS (Simple Knowledge Organization System ) format for representing lexicons, dictionaries and terminologies. Primarily, the tool is supposed to be helpful in data labeling and normalization of biomedical texts, however, with the help of SKOS, WordNet and DBPedia it can be helpful in any domain.

When I mentioned normalization and labeling, for some readers not familiar with text mining and some aspects of semantic web, I better briefly explain. Basically, usual natural language text

0

Expirience from Lisbon Machine Learning Summer School

I have participated on Lisbon Machine Learning Summer School (LxMLS), which took place on July 16-23  at Instituto Superior Técnico, a leading Engineering and Science school in Portugal. It is organized jointly by IST, the Instituto de Telecomunicações and the Spoken Language Systems Lab – L2F of INESC-ID. It was quite a great experience, on the one side to see Lisbon, while on the other learn a bit from the best people in Machine Learning and Natural Language Processing and meet fellow PhD students who work in the same or similar area as I do around the Europe. I will just briefly tell the experience.

I arrived to Lisbon day before, on 15th July. That evening we already could

0

Marvin – semantic annotator

We created Marvin, an semantic text annotation tool that uses external resources such as DBPedia and WordNet to annotate semantically text. Marvin is built in java and can be used as a standalone application or as a library.

Marvin semantic annotator already has a lot of knowledge, which will make anyone probably depressed and therefore we gave him a name with reference to “Hitchhikers guide to the Galaxy” depressed robot.

 

 

Basically, Marvin queries the sources he can for the definitions about words in inputed text. He uses some NLP transformations in order to obtain as much knowledge as possible. For DBPedia he uses unigram and bigrams, while wordnet queries only with single words.

Instruction how to download, install

0

xgoogle python library upgrade for google image search

Couple of days ago I realized that I need some library that will allow me search google and especially face images. I was previously working a bit with the Google API, however Google offers only 100 requests per day and there are some other limitations. It is very good API if you pay and want to commercialize the thing you have done and don’t mind some restrictions. However, if you try to search Google, download HTML and parse it trough python, it will return that the request is forbidden. So I looked for the solution and found out a very good python library, made couple of years ago by Peteris Krumins called xgoogle. It supports normal Google search and