What is the structure in data?


Building a search engine (Lucene tutorial)

Since the Google took over lives and branded a verb for searching as Googling, making a search engine is considered cool thing. I have crossed over search engines several times in my life. Even I worked in a company that was I guess pretending to build search engine (I was there just one month, when I realized they are not serious). But also I got some background on some text mining courses (both at Coursera and at the University of Manchester) and I came to a point of my research where I had to build search engine. Not many people today are building search engine from the scratch, since there are several engine libraries out there and one of the


Scientific blogging

I have seen many advices on how important it is to blog if you want to become influential scientists. Even on my university (University of Manchester), faculties are organizing crash seminars on scientific blogging for researchers with introduction to WordPress. However, I am quite familiar with WordPress, since I was working on some of the most popular plugins out there, and also I did some major modifications on some themes and ,of course, I have Wordress blog (this one). I have seen many scientist starting blogging based on this advise. So their first blog post is about getting on PhD and their research. That is not my case. I have this blog for several years. About blogging and social media


Brief introduction to Linked Data

My recent research brought me to linked data as quite interesting concept. I will here write some brief introduction and notes on linked data. Probably at some future point of time I will go deeper into standards and usage.

Current state of the Web

Internet have revolutionized the way we communicate and how we access and handle data. It becomes very hard to imagine world without internet, even it exist only 25 years. Many things have changed even on internet over these years. From the first html only pages linked with each other, we moved to Web 2.0, where everyone can contribute to quality and amount of data. However, these concepts are built for humans, which is ok, since humans


Political bot (AI) fighting human bots (using NLP and OCR)

Probably I should write this on Serbian, but to keep consistency, English it is.

Since soon elections gonna be held in Serbia, there is a lot of talk about political campaigns. And one of the major issue in the news are human bots applied in the political campaign on the internet. Since parties in Serbia have too many members (it is estimated that almost every second person in the country is member of some party), they applied their members as bots to watch over news articles on internet portals and comment (make people vote for the party they are members of). Couple of years ago, there was no internet campaign at all in Serbia. Now, thousands of people are commenting articles


What is the big deal with natural language processing?

Recently here at Manchester University, at one class for all PhD students we realized that almost half of student in a group are doing some kind of natural language processing and almost everyone was doing something related with machine learning (even hardware guys are building neural network like multi-processor architecture). Unfortunately, these efforts are not joint, but are executed over several research groups (NLP and text mining research group, National Centre for Text Mining has it’s research student, and probably there is one more group). However, there is a lot of effort going on here, which is about natural language understanding. So what is a big deal? Why so many projects are funded in this particular field? I cannot say


Introducing OWASP Seraphimdroid

About 2 months ago I started thinking about creating Android security application. I was looking where the other application are weak, since there are a lot of android device protection and anti malware application available on Google play. Thing I found that most of those application don’t use application permissions as indicator that some other application is malicious. Other thing I also found is that a lot of features, that are quite easy to develop are premium. As I was looking for project to train myself, and help others to train developing android security tools that had not that luck to be employed by some anti virus company, I decided to create open source project. There will be no other


Lego Mindstorms NXT 2.0 car programmed in Microsoft Robotics

As I promissed before (, I tryed to build robot vehicle and program it using Microsoft robotics, instead of Development Kit that comes with the box. Development kit that comes with Lego mindstorms NXT 2.0 is some simplified version of LabView. I must admit that I never worked with LabView, so I don’t know much about original version, but this version that comes with box works sequentially. That means that robot will walk how much you say, then it will stop, check sensor, continue walking, as on video from previous post. What I wanted to do is to create robot that will move flawlessly, that will check sensors while it is moving. So I tryed Microsoft robotics.

Microsoft robotics


Team performance – advantages of a small team

There are many teories on how to organize and how team to perform best. I would not talk about them all. I just want to point out one of my experiences where I feel most confortable and where I feel that team is most productive.

I had experience with big companies, big teams, small companies again big teams, small company small team etc. That is why I think I have a right to talk about it. So what is best organization? I would vote for small team, actual minimal team. What do I mean by that? Two or three people teams. In this constalation you will have most productive teams. If you have big product with a lot of people,


Personalized relevance classifier of sentences

In this article I would like just to pitch idea about personalized classifier, and I would like to hear your opinion if this approach could be good and what can be problems with it. So what is the problem? I would like to build personalized relevance classifier.

Problem definition

Every user is tracking mentions of some term on internet or social media. Terms are usually brands they want to watch if they are some marketing guys or business owners, or some events, names etc. Since term can be ambiguous, user has opportunity to tell the program that some sentence is irrelevant for him. For example if user enter “Apple”, first it will show all mentions of Apple company and fruit