Annotation

How to use chatbots for data annotation

Recent Advances in Chatbot Platforms

Chatbots have been used in customer service for a long time. As of recently, platforms started to appear that lowered the bar for developing chatbots and deploying them to a global audience.

While traditional chatbots were operating purely on textual interaction, popular platforms such as Telegram and Facebook Messenger have evolved and now offer additional functionality such as structured UI components. Telegram is pioneering the market, allowing to build chatbots featuring images, audio files, buttons and even payments.

Given our previous experience with web-based annotation, we decided to give chatbots a try. In this article, we will show how we successfully designed an annotation process based on Telegram.

Web-based Data Annotation

In the past, we have developed custom web-based annotation tools to satisfy the requirements of individual projects. Unfortunately, off-the-shelf solutions made their own assumptions which were not always applicable to the problems we had to solve.

Every annotation project we worked on so far had unique properties, from needing to have assessment tests to supporting agreement resolution. Given the fact that high-quality data is quintessential for a Data Scientist to train models, the annotation process is an important step of the overall development process.

Developing our own web-based tools worked out fine, but always took a considerable amount of time, so we were looking for alternatives that let us prototype and iterate faster.

Chatbot-based Data Annotation

In our most recent project, we decided to build a chatbot for an audio annotation task. Our requirements were numerous:

  • Quick annotation of audio sample
  • Decent user experience
  • Multiple labels per sample
  • Model label dependencies
  • Cost-efficient development
  • Multiple batches
  • Ability to assess annotators

UI

Here is a screenshot of the final result, with the actual task changed for confidentiality reasons:

Given the textual nature of chatbots, the user interface is rather simple. However, it was designed to be as self-explanatory as possible. The first time users connect to the bot, they are asked for their name. Afterwards, we provide them with detailed instructions on how to annotate our data. Then, they can label audio samples using inline buttons. We also indicate the label number for them to better keep track of how many annotations are still missing.

On the management-side, we introduced commands to facilitate supervision. Those commands can only be performed by the administrator. This helps us track the progress of users (/users), assign batches (/new-batch) and list the agreement ratio per label (/agreement). Now, the manager can check at any time what the agreement is for every label and can use this information to decide how to budget as well as how many more annotators need to hire.

Optimisations

For better budgeting and forecasting, we introduced the concept of batches. At the beginning, we have to define for each hired annotator their batch size which is the total number of labels they have to annotate.

Another optimisation was the introduction of an adaptive label selection. Whereas traditional tools require all annotators to label the same instances, we make the label resolution part of the annotation process by computing the agreement across users. Before presenting a label to a user, we detect if the majority already agrees on a certain label. If so, we will skip this label and move on to the next. This lowers the costs and biases the annotation process towards harder samples, consequently increasing the overall data quality.

What as mentioned above in the requirements as 'label dependencies' is where a label can either trigger follow-up labels or skip all other labels. We had one specific label that was meant to catch samples we would like to avoid. If users selected it, we would hide this sample also for other users.

Evaluation

We successfully conducted an entire data annotation process with our tool. The process we came up with was the following: First, we hand-selected suitable candidates from our pool of applicants. We asked them to connect to our Telegram bot. Then, an annotator needed to pass 15 sample annotations which only took them a couple of minutes to finish. Their results were assessed manually. If an annotator qualified, we created a larger batch for them and could see their progress as well as how the agreement ratios developed.

We encountered two major obstacles: Some annotators did not carefully read the instructions. We solved this by notifying them if we noticed such problems during the assessment phase, however this could be automated too in the future. Since there is a constant flow of information, in our first iteration annotators found it difficult to mentally associate a label with the latest audio. Changing the style and formatting solved this problem. Here, regular UIs could have been advantageous as there is no backlog of messages.

There were also upsides: We could focus more on designing the process as a whole and the logic as opposed to developing a UI. It turned out that we could implement this bot more cost- and time-efficiently than our previous web-based tools. Making the label resolution of the annotation process allows us to better supervise the annotations and track their quality very early on. From our annotators, we received also positive feedback that the interaction with the bot was rather enjoyable.

Future Improvements

It remains to be seen how well our approach can be applied to other types of media, e.g. large images, maps or videos with bounding box annotation. For our purposes the Telegram platform was fairly sufficient and even provided many more features than we needed.

If we wanted to scale up to hundreds of annotators, we could send push notifications to follow-up with candidates who did not finish their work. Or when on-boarding annotators, we can handle the assessment test automatically and give advice to avoid common pitfalls. A step further would be to handle the entire communication with the annotators and also their payments via Telegram to make the process even smoother.

If we can help you in any way setting up an annotation process, please feel free to contact us.