Tutorial 6 - Creating a more complex Annotator¶
Section 1 - Loading and using Machine Learning Models¶
In this tutorial you will create a more complex annotator that can detect
objects and humans by using the a pretained Machine Learning Model of the YOLO family.
YOLO stands for “You Only Look Once” and is a family of Machine Learning Models
that only get to process an image once for object detection. This means that
these models are relatively fast. The ultralytics
library for python provides
the functionality to load a pre-trained model from a file.
We will create a new annotator called SimpleYoloAnnotator
which you can find
in src/robokudo/annotators/simple_yolo_annotator.py
. As in the previous
tutorials the update()
and add_to_image()
functions have to be implemented.
All necessary libraries are already imported for you. The most important new
type is the YOLO
object. This object takes a single parameter representing
the name of the model file. In this tutorial we will use the file "model.pt"
.
The YOLO
object will automatically load the model on initialization.
Task 6-1-1: Create a new class attribute
self.model
in the__init__()
function. This should be a newYOLO
object.Task 6-1-2: In the update function load the
COLOR_IMAGE
from the CAS into a variable by creating a deepcopy of it.
The color image can be used to run “inference”, meaning the model gets the image
data and returns its predictions to us. To do so efficiently we will use a
context manager provided by another machine learning library called torch
.
torch
is a lower level library that is also used by the ultralytics
library
internally. Add this code in the update
function after loading the COLOR_IMAGE
:
with torch.no_grad():
This context manager disables some features that are not needed in this
tutorial and improves performance.
The object in self.model
is callable. This means to run inference with it
we just have to call the model with a few parameters.
For this tutorial pass your image as a positional argument and the keyword
argument conf=0.9
. The conf
value is the minimum confidence percentage
needed for a result to be accepted. Meaning a prediction will only be returned
if the model is atleast 90% sure that the prediction is correct.
Task 6-1-3: Call the
YOLO
object inside of the context manager and write the results into a new variable for exampleresults
.
The new variable will now contain a list of Results
objects. One object
for every image we passed to the model. This means our list will only
contain a single instance as only a single image was passed to the model.
The Results
object itself has an attribute called boxes
containing a
of Boxes
object. This object then contains the bounding boxes of the detected
objects. However they are still stored as torch.Tensor
objects which have to
be converted to something we can access better. The Boxes
class offers a few
functions for this:
Boxes.cpu()
This will move the tensors to CPU memory for example in case they are stored on the GPU memory. We are not using a GPU, but we will call this anyway to be safe.
Boxes.numpy()
This will turn all tensors in the Boxes
object into numpy arrays.
Task 6-1-4: Load the boxes object into cpu memory and convert its tensors to a numpy arrays. Store the result in a variable like
results_np
Task 6-1-5: Create a loop over
results_np
an for every item do the following:Save the bounding box in a variable like
bbox
in thexyxy
format by accessingitem.xyxy[0]
Add a white rectangle to the
COLOR_IMAGE
you loaded in the previous tasks using the datapoints inbbox
(Note: You dont have to implementadd_to_image
yet)Write the new
COLOR_IMAGE
to the annotator outputs as done in the previous tutorials
Task 6-1-6: Create a new Sequence containing the
SimpleYoloAnnotator
. Make it run on the query typeyolo-detect
instead of the current sequence on that query type.
The code for the SimpleYoloAnnotator
could look something like this:
# ...
class SimpleYoloAnnotator(robokudo.annotators.core.BaseAnnotator):
def __init__(self, name="SimpleYoloAnnotator"):
super().__init__(name=name)
self.model = YOLO("best.pt")
def update(self):
# Read out the input color image and create a copy
visualization_img = copy.deepcopy(self.get_cas().get(CASViews.COLOR_IMAGE))
# Use the context manager to improve performance
with torch.no_grad():
# Run inference on the color image from the CAS
result_tensor = self.model(visualization_img, conf=0.9, show=False)[0]
# Move the Results object to cpu memory and convert its tensors to numpy arrays
result_np = result_tensor.boxes.cpu().numpy()
# Iterate over all results
for result in result_np:
# Read out the bounding box in xyxy format
bbox = result.xyxy[0]
# Add the bounding box to the visualization_img
visualization_img = cv2.rectangle(visualization_img, (bbox[0], bbox[1]), (bbox[2], bbox[3]), (255, 255, 255), 2)
# Write the final image containing the newly created rectangles to the
# annotator output
self.get_annotator_output_struct().set_image(visualization_img)
return py_trees.Status.SUCCESS
# ...
Section 2 - Classifying Objects and creating Annotations¶
Now that the object detection is working, we will implement object classification.
The best.pt
model used exposes its available classes after loading in form of
a dictionary with numbers as keys. These numbers are ids for the classes. These
ids are returned by the inference, so this dictionary makes it possible to know
what the actual class name is. In our case the dictionary can be accessed
through the YOLO
object we created in section one by using self.model.names
.
Task 6-2-1: Create a new class attribute
self.id2name
in the__init__()
function containing the id to class pairs.
For each of the results we got out of the inference we now want to create
annotations. Since we are detecting objects the ObjectHypothesis
will be used.
This annotation however can not store classification data so we will also attach
a robokudo.types.annotations.Classification
to the ObjectHypothesis
objects.
The class of a result can be read out through result.cls[0]
, which returns the
id that then has to be used with the newly created self.id2name
to return a
class in textual form. We can also read out how much confidence the model has
in its prediction by reading out result.conf[0]
.
Every annotation in RoboKudo can also store its source annotator in the class
field source
. This is usually set to self.name
from within the annotator and
is useful in scenarios where you need to find out which annotator added the
annotation.
Task 6-2-2: Inside of the results loop store the class id and class name into two separate variables.
Task 6-2-3: Create a new
Classification
object for each result and store the class name into its class fieldname
and the confidence into its class fieldconfidence
. Also store the annotators source in the classessource
field.
There is a lot of information that we can now put into the ObjectHypothesis
.
We already have the bounding box for each object and we also know its type.
Because ObjectHypothesis
inherit from the IdentifiableAnnotation
we can also
store and object id
that can be used to uniquely identify an annotation. For
now we will just use the index of the results list for this value. When there
are multiple sources of ObjectHypothesis
however, the id has to be retrieved
in a different manner.
Task 6-2-4: Create a new
ObjectHypothesis
with the following dataThe class fields
source
andid
as explained aboveThe class field
name
should contain the textual class namePass the bounding box to the
roi.roi
object in theObjectHypothesis
Add the
Classification
object to theObjectHypothesis.annotations
list
Tip
The results bounding box is currently in the xyxy
form, but the
roi.roi
only has x
, y
, width
and height
.
Important
The bounding box coordinates are of type float
, the roi.roi
however only takes in the type int
Now that we have a complete ObjectHypothesis
we have to add it to the CAS
somehow so that other annotators can access these annotations too. To do so
you can add the annotation to the self.get_cas().annotations
list.
Task 6-2-5: Add the newly created
ObjectHypothesis
to theCAS
Tip
To make this a bit more efficient you can store the ObjectHypothesis
in a small list during the loop and add them all at once to the CAS
by using the
extend
function offered by lists.
Task 6-2-6: Send a new
yolo-detect
query to RoboKudo and confirm through the details tree in the CAS output visualization that all the values you added to theObjectHypothesis
are correct. This should includeThe objects source annotator
The objects position, which is derived from
roi.roi
The objects id
A Classification annotation with
The source annotator
The classname and class id
The confidence
Task 6-2-7: Implement the
add_to_image
function with the following criteariaThe bounding box should be added there instead of directly in the loop
Add a text to the bounding box containing both the class name and confidence
Task 6-2-8: Send a new
yolo-detect
query to RoboKudo and confirm that the bounding boxes and text are correctly added to the output image.
The final output of the annotator could look something like this:
The final code for the SimpleYoloAnnotator
could look something like this:
# ...
class SimpleYoloAnnotator(robokudo.annotators.core.BaseAnnotator):
def __init__(self, name="SimpleYoloAnnotator"):
super().__init__(name=name)
self.model = YOLO("best.pt")
self.id2name = self.model.names
def update(self):
visualization_img = copy.deepcopy(self.get_cas().get(CASViews.COLOR_IMAGE))
with torch.no_grad():
result_tensor = self.model(visualization_img, conf=0.9, show=False)[0]
result_np = result_tensor.boxes.cpu().numpy()
object_hypotheses = []
for obj_id, result in enumerate(result_np):
bbox = result.xyxy[0]
cls = result.cls[0]
name = self.id2name[cls]
classification = robokudo.types.annotation.Classification()
classification.source = self.name
classification.classname = name
classification.confidence = result.conf[0]
object_hypothesis = robokudo.types.scene.ObjectHypothesis()
object_hypothesis.source = self.name
object_hypothesis.type = cls
object_hypothesis.id = obj_id
object_hypothesis.roi.roi.pos.x = int(bbox[0])
object_hypothesis.roi.roi.pos.y = int(bbox[1])
object_hypothesis.roi.roi.width = int(bbox[2] - bbox[0])
object_hypothesis.roi.roi.height = int(bbox[3] - bbox[1])
object_hypothesis.annotations.append(classification)
visualization_img = self.add_to_image(object_hypothesis, visualization_img)
object_hypotheses.append(object_hypothesis)
self.get_cas().annotations.extend(object_hypotheses)
self.get_annotator_output_struct().set_image(visualization_img)
return py_trees.Status.SUCCESS
@staticmethod
def add_to_image(obj, image):
x1, y1, x2, y2 = (obj.roi.roi.pos.x, obj.roi.roi.pos.y,
obj.roi.roi.pos.x + obj.roi.roi.width,
obj.roi.roi.pos.y + obj.roi.roi.height)
vis_text = f"{obj.annotations[0].classname}, {obj.annotations[0].confidence:.2f}"
font = cv2.FONT_HERSHEY_COMPLEX
image = cv2.putText(image, vis_text, (x1, y1 - 5), font, 0.5,
(0, 0, 255), 1, 2)
image = cv2.rectangle(image, (x1, y1), (x2, y2), (255, 255, 255), 2)
return image
Section 3 - Searching for objects¶
Detecting objects is already a good start. However it is even more useful to be
able to search for objects aswell. As already done in the previous tutorials
we will now implement another small annotator called ClassFilterAnnotator
.
As the name suggests it is again used to filter objects depending on the query
that was sent to RoboKudo. This time we will use the obj.type
field to search
for the different classes however. You can find the template annotator in
class_filter.py
.
Implement the annotators
update
function so that the objects are added to the output image with theadd_to_image
function only ifobj.type
is equal to the objects classification.Add the annotator to the sequence after the
SimpleYoloAnnotator
and try to identify the milk by sending ayolo-detect
query while specifying a type.
The final output should look something like this:
The final code for the class_filter.py
could look like this:
class ClassFilterAnnotator(robokudo.annotators.core.BaseAnnotator):
# ...
def update(self) -> py_trees.common.Status:
query = self.get_cas().get(CASViews.QUERY)
annotations = self.get_cas().filter_annotations_by_type(robokudo.types.scene.ObjectHypothesis)
visualization_img = copy.deepcopy(self.get_cas().get(CASViews.COLOR_IMAGE))
for annotation in annotations:
for oh in annotation.annotations:
if (isinstance(oh, robokudo.types.annotation.Classification)
and query.obj.type.lower() == oh.classname.lower()):
visualization_img = self.add_to_image(annotation, visualization_img)
self.get_annotator_output_struct().set_image(visualization_img)
return py_trees.Status.SUCCESS
# ...