Data labeling: types and use cases

What is Data labeling? What are it's major types and use cases?

A-Z of Data Annotation
Main image

Data labeling: A brief introduction

The technique of marking labels to make text, audio, or images in order to help AI models is known as Data labeling. It is an important aspect of Supervised Learning - an AI technique to teach machines to understand using lots of labeled examples.

For instance if you want to develop a program to identify dogs in images, you must go through the rigorous process of feeding it with thousands of labeled pictures of dogs and “non-dogs” to help the model learn what dogs look like. The system will then be able to use its newly built knowledge base to find out whether an image contains a dog in it.

Data labeling: Types and Use-cases

Data labeling is essential for scalability in AI and machine learning applications. It provides the foundation for teaching a machine learning model what it needs to know and how to discriminate between different inputs in order to produce reliable results.

Depending on the format of the data, there are many distinct types of data labeling modalities. Image and video object annotation, semantic segmentation, text categorization, and content categorization are a few examples of this.

The great majority of problems for which AI models are being developed can be categorised into the following tasks. Data labeling techniques for each of these tasks are different from each other.

  • Sequencing - The process of assigning a start (left boundary), an end (right boundary), and a label to a text or time series.
    Use-case : Recognise a person's name in a text, locate a line in a contract etc.

  • Categorization - Assign binary or multiple classes, with or without hierarchy to data samples.
    Use-case : Categorize a book according to the BISAC ontology, categorize an image as offensive or not offensive.

semantic-segmentation-oranges
Semantic segmentation allows AI models to identify object boundaries

  • Segmentation: Identify particular segments of a data sample. Finding paragraph splits, or an object boundary in a picture, or transitions between speakers etc.
    Use-case : Identify boundaries of people on a road, identify speakers based on audio etc.

  • Mapping: From one language to another, full text to summary, question to answer, raw data to normalized data etc.
    Use-case : Translate from French to English, normalize a date from free text to standard format)

  • Intent extraction: Process of identifying the intent from text. for eg. in the text "Order me a book", the intent is "to order".
    Use-case : Widely used by speech recognizers like Siri to figure out what the speaker is asking for.

3d-pointcloud-annotation
3D pointclouds help machines understand the scene around them more accurately

  • Object Detection: Technique of recognising and distinguishing visual objects from one another within a set of defined categories

    • Images: Object Detection is widely used with images and videos to identify the location of an object within the image.
      Use-case: Identify pedestrians, cars, trucks etc on a road.

    • 3D pointclouds: 3D pointclouds are a collection of points in 3D space collected by a device called a LIDAR. A LIDAR essentially shoots lasers in all directions, and figures out the existence of a solid object in 3D space by measuring the time it takes the laser to come back to the device.
      3D pointclouds allow machines to see around them with much more precision. These are widely used in Autonomous driving scenarios to understand the scene around the vehicle.

Schedule free consultation