# Tutorial 6 - Creating a more complex Annotator ## Section 1 - Loading and using Machine Learning Models In this tutorial you will create a more complex annotator that can detect objects and humans by using the a pretained Machine Learning Model of the YOLO family. YOLO stands for "You Only Look Once" and is a family of Machine Learning Models that only get to process an image once for object detection. This means that these models are relatively fast. The `ultralytics` library for python provides the functionality to load a pre-trained model from a file. We will create a new annotator called `SimpleYoloAnnotator` which you can find in `src/robokudo/annotators/simple_yolo_annotator.py`. As in the previous tutorials the `update()` and `add_to_image()` functions have to be implemented. All necessary libraries are already imported for you. The most important new type is the `YOLO` object. This object takes a single parameter representing the name of the model file. In this tutorial we will use the file `"model.pt"`. The `YOLO` object will automatically load the model on initialization. - **Task 6-1-1:** Create a new class attribute `self.model` in the `__init__()` function. This should be a new `YOLO` object. - **Task 6-1-2:** In the update function load the `COLOR_IMAGE` from the CAS into a variable by creating a deepcopy of it. The color image can be used to run "inference", meaning the model gets the image data and returns its predictions to us. To do so efficiently we will use a context manager provided by another machine learning library called `torch`. `torch` is a lower level library that is also used by the `ultralytics` library internally. Add this code in the `update` function after loading the `COLOR_IMAGE`: ```python with torch.no_grad(): ``` This context manager disables some features that are not needed in this tutorial and improves performance. The object in `self.model` is callable. This means to run inference with it we just have to call the model with a few parameters. For this tutorial pass your image as a positional argument and the keyword argument `conf=0.9`. The `conf` value is the minimum confidence percentage needed for a result to be accepted. Meaning a prediction will only be returned if the model is atleast 90% sure that the prediction is correct. - **Task 6-1-3:** Call the `YOLO` object inside of the context manager and write the results into a new variable for example `results`. The new variable will now contain a list of `Results` objects. One object for every image we passed to the model. This means our list will only contain a single instance as only a single image was passed to the model. The `Results` object itself has an attribute called `boxes` containing a of `Boxes` object. This object then contains the bounding boxes of the detected objects. However they are still stored as `torch.Tensor` objects which have to be converted to something we can access better. The `Boxes` class offers a few functions for this: ```python Boxes.cpu() ``` This will move the tensors to CPU memory for example in case they are stored on the GPU memory. We are not using a GPU, but we will call this anyway to be safe. ```python Boxes.numpy() ``` This will turn all tensors in the `Boxes` object into numpy arrays. - **Task 6-1-4:** Load the boxes object into cpu memory and convert its tensors to a numpy arrays. Store the result in a variable like `results_np` - **Task 6-1-5:** Create a loop over `results_np` an for every item do the following: - Save the bounding box in a variable like `bbox` in the `xyxy` format by accessing `item.xyxy[0]` - Add a white rectangle to the `COLOR_IMAGE` you loaded in the previous tasks using the datapoints in `bbox` (Note: You dont have to implement `add_to_image` yet) - Write the new `COLOR_IMAGE` to the annotator outputs as done in the previous tutorials - **Task 6-1-6:** Create a new Sequence containing the `SimpleYoloAnnotator`. Make it run on the query type `yolo-detect` instead of the current sequence on that query type. :::{admonition} The code for the `SimpleYoloAnnotator` could look something like this: :class: dropdown hint ```python # ... class SimpleYoloAnnotator(robokudo.annotators.core.BaseAnnotator): def __init__(self, name="SimpleYoloAnnotator"): super().__init__(name=name) self.model = YOLO("best.pt") def update(self): # Read out the input color image and create a copy visualization_img = copy.deepcopy(self.get_cas().get(CASViews.COLOR_IMAGE)) # Use the context manager to improve performance with torch.no_grad(): # Run inference on the color image from the CAS result_tensor = self.model(visualization_img, conf=0.9, show=False)[0] # Move the Results object to cpu memory and convert its tensors to numpy arrays result_np = result_tensor.boxes.cpu().numpy() # Iterate over all results for result in result_np: # Read out the bounding box in xyxy format bbox = result.xyxy[0] # Add the bounding box to the visualization_img visualization_img = cv2.rectangle(visualization_img, (bbox[0], bbox[1]), (bbox[2], bbox[3]), (255, 255, 255), 2) # Write the final image containing the newly created rectangles to the # annotator output self.get_annotator_output_struct().set_image(visualization_img) return py_trees.Status.SUCCESS # ... ``` ::: ## Section 2 - Classifying Objects and creating Annotations Now that the object detection is working, we will implement object classification. The `best.pt` model used exposes its available classes after loading in form of a dictionary with numbers as keys. These numbers are ids for the classes. These ids are returned by the inference, so this dictionary makes it possible to know what the actual class name is. In our case the dictionary can be accessed through the `YOLO` object we created in section one by using `self.model.names`. - **Task 6-2-1:** Create a new class attribute `self.id2name` in the `__init__()` function containing the id to class pairs. For each of the results we got out of the inference we now want to create annotations. Since we are detecting objects the `ObjectHypothesis` will be used. This annotation however can not store classification data so we will also attach a `robokudo.types.annotations.Classification` to the `ObjectHypothesis` objects. The class of a result can be read out through `result.cls[0]`, which returns the id that then has to be used with the newly created `self.id2name` to return a class in textual form. We can also read out how much confidence the model has in its prediction by reading out `result.conf[0]`. Every annotation in RoboKudo can also store its source annotator in the class field `source`. This is usually set to `self.name` from within the annotator and is useful in scenarios where you need to find out which annotator added the annotation. - **Task 6-2-2:** Inside of the results loop store the class id and class name into two separate variables. - **Task 6-2-3:** Create a new `Classification` object for each result and store the class name into its class field `name` and the confidence into its class field `confidence`. Also store the annotators source in the classes `source` field. There is a lot of information that we can now put into the `ObjectHypothesis`. We already have the bounding box for each object and we also know its type. Because `ObjectHypothesis` inherit from the `IdentifiableAnnotation` we can also store and object `id` that can be used to uniquely identify an annotation. For now we will just use the index of the results list for this value. When there are multiple sources of `ObjectHypothesis` however, the id has to be retrieved in a different manner. - **Task 6-2-4:** Create a new `ObjectHypothesis` with the following data - The class fields `source` and `id` as explained above - The class field `name` should contain the textual class name - Pass the bounding box to the `roi.roi` object in the `ObjectHypothesis` - Add the `Classification` object to the `ObjectHypothesis.annotations` list :::{tip} The results bounding box is currently in the `xyxy` form, but the `roi.roi` only has `x`, `y`, `width` and `height`. ::: :::{important} The bounding box coordinates are of type `float`, the `roi.roi` however only takes in the type `int` ::: Now that we have a complete `ObjectHypothesis` we have to add it to the `CAS` somehow so that other annotators can access these annotations too. To do so you can add the annotation to the `self.get_cas().annotations` list. - **Task 6-2-5:** Add the newly created `ObjectHypothesis` to the `CAS` :::{tip} To make this a bit more efficient you can store the `ObjectHypothesis` in a small list during the loop and add them all at once to the `CAS` by using the `extend` function offered by lists. ::: - **Task 6-2-6:** Send a new `yolo-detect` query to RoboKudo and confirm through the details tree in the CAS output visualization that all the values you added to the `ObjectHypothesis` are correct. This should include - The objects source annotator - The objects position, which is derived from `roi.roi` - The objects id - A Classification annotation with - The source annotator - The classname and class id - The confidence - **Task 6-2-7:** Implement the `add_to_image` function with the following critearia - The bounding box should be added there instead of directly in the loop - Add a text to the bounding box containing both the class name and confidence - **Task 6-2-8:** Send a new `yolo-detect` query to RoboKudo and confirm that the bounding boxes and text are correctly added to the output image. The final output of the annotator could look something like this: ![](../img/06-yolo-output.png) :::{admonition} The final code for the `SimpleYoloAnnotator` could look something like this: :class: dropdown hint ```python # ... class SimpleYoloAnnotator(robokudo.annotators.core.BaseAnnotator): def __init__(self, name="SimpleYoloAnnotator"): super().__init__(name=name) self.model = YOLO("best.pt") self.id2name = self.model.names def update(self): visualization_img = copy.deepcopy(self.get_cas().get(CASViews.COLOR_IMAGE)) with torch.no_grad(): result_tensor = self.model(visualization_img, conf=0.9, show=False)[0] result_np = result_tensor.boxes.cpu().numpy() object_hypotheses = [] for obj_id, result in enumerate(result_np): bbox = result.xyxy[0] cls = result.cls[0] name = self.id2name[cls] classification = robokudo.types.annotation.Classification() classification.source = self.name classification.classname = name classification.confidence = result.conf[0] object_hypothesis = robokudo.types.scene.ObjectHypothesis() object_hypothesis.source = self.name object_hypothesis.type = cls object_hypothesis.id = obj_id object_hypothesis.roi.roi.pos.x = int(bbox[0]) object_hypothesis.roi.roi.pos.y = int(bbox[1]) object_hypothesis.roi.roi.width = int(bbox[2] - bbox[0]) object_hypothesis.roi.roi.height = int(bbox[3] - bbox[1]) object_hypothesis.annotations.append(classification) visualization_img = self.add_to_image(object_hypothesis, visualization_img) object_hypotheses.append(object_hypothesis) self.get_cas().annotations.extend(object_hypotheses) self.get_annotator_output_struct().set_image(visualization_img) return py_trees.Status.SUCCESS @staticmethod def add_to_image(obj, image): x1, y1, x2, y2 = (obj.roi.roi.pos.x, obj.roi.roi.pos.y, obj.roi.roi.pos.x + obj.roi.roi.width, obj.roi.roi.pos.y + obj.roi.roi.height) vis_text = f"{obj.annotations[0].classname}, {obj.annotations[0].confidence:.2f}" font = cv2.FONT_HERSHEY_COMPLEX image = cv2.putText(image, vis_text, (x1, y1 - 5), font, 0.5, (0, 0, 255), 1, 2) image = cv2.rectangle(image, (x1, y1), (x2, y2), (255, 255, 255), 2) return image ``` ::: ## Section 3 - Searching for objects Detecting objects is already a good start. However it is even more useful to be able to search for objects aswell. As already done in the previous tutorials we will now implement another small annotator called `ClassFilterAnnotator`. As the name suggests it is again used to filter objects depending on the query that was sent to RoboKudo. This time we will use the `obj.type` field to search for the different classes however. You can find the template annotator in `class_filter.py`. - Implement the annotators `update` function so that the objects are added to the output image with the `add_to_image` function only if `obj.type` is equal to the objects classification. - Add the annotator to the sequence **after** the `SimpleYoloAnnotator` and try to identify the milk by sending a `yolo-detect` query while specifying a type. The final output should look something like this: ![](../img/06-yolo-output-filtered.png) :::{admonition} The final code for the `class_filter.py` could look like this: :class: dropdown hint ```py class ClassFilterAnnotator(robokudo.annotators.core.BaseAnnotator): # ... def update(self) -> py_trees.common.Status: query = self.get_cas().get(CASViews.QUERY) annotations = self.get_cas().filter_annotations_by_type(robokudo.types.scene.ObjectHypothesis) visualization_img = copy.deepcopy(self.get_cas().get(CASViews.COLOR_IMAGE)) for annotation in annotations: for oh in annotation.annotations: if (isinstance(oh, robokudo.types.annotation.Classification) and query.obj.type.lower() == oh.classname.lower()): visualization_img = self.add_to_image(annotation, visualization_img) self.get_annotator_output_struct().set_image(visualization_img) return py_trees.Status.SUCCESS # ... ``` :::