Skip to content

[BUG] YOLOv5 export flattens multi-polygon & hole-containing Polylines and ignores masks in Detection → malformed labels #6421

@vittorio-prodomo

Description

@vittorio-prodomo

Describe the problem

As far as I understood after using FiftyOne for a while for my instance segmentation datasets, FiftyOne datasets only have support for having one type of object in a "label" field at a time. So for example, if we load a COCO-formatted dataset into FiftyOne, we can either have it show up with a label field called "segmentations" that contains Detection objects (with dense masks) or one called "polylines" that contains Polyline objects, representing polygons (if we put use_polylines=True in the importing process).

Since I am working a lot with both the COCO format and the YOLO Ultralytics format (for segmentation), and I continuously go back and forth between the two formats, I always tend to check how the conversions between these formats work in the tools I use, especially for some special edge cases.

I just noticed that when exporting to YOLOv5 format via FiftyOne’s YOLOv5 exporter, there are multiple flaws in the handling of fol.Polyline and fol.Detection(mask=…) objects:

Scenario description

Let me describe the test scenario: I created two fictional classes called 'Mask' and 'Polygon', that were drawn in the CVAT tool as, respectively, dense masks and polygons. Just for those that don't know, the CVAT tool, when exporting in COCO format, leaves the dense masks in RLE format in the json file (no contouring is done on their side). To test edge cases, I thus created: simple, single polygons; disconnected polygons referring to the same object; a single mask with single hole inside; a mask with various, disconnected shapes and multiple holes inside.

When we want to import this dataset into FiftyOne, we have to choose (as I mentioned earlier) between Detections and Polylines. So basically we either end up with all labels converted to dense masks (FiftyOne Detection objects), or, if we passed use_polylines=True to the importer, we end up with all labels forcibly converted to polylines.


COCO Import: use_polylines=False

COCO Import: use_polylines=True

Current exporting logic

The current implementation for writing to disk the labels in YOLOv5 format is the following:

def _make_yolo_row(label, target, confidence=None):
if isinstance(label, fol.Polyline):
points = itertools.chain.from_iterable(label.points)
row = "%d " % target
return row + " ".join("%f %f" % tuple(p) for p in points)
xtl, ytl, w, h = label.bounding_box
xc = xtl + 0.5 * w
yc = ytl + 0.5 * h
row = "%d %f %f %f %f" % (target, xc, yc, w, h)
if confidence is not None:
row += " %f" % confidence
return row

The problem with how the internal FiftyOne data representation affects the YOLOv5 export process is two-fold: let's start from the simplest, which is how Detections are handled.

Detection masks are ignored

Simply put, if our dataset has Detections, and a Detection contains a mask (fairly normal, for instance segmentation), the YOLO exporter ignores it and only writes the bounding box (xc, yc, w, h). This basically skips the usual operations of contouring used elsewhere (in this repo as well) to convert dense masks to polygons, and unexpectedly converts our instance segmentation problem into an object detection one. Here's an example:


This becomes ------>

...This: no more polygons/masks

Polylines problems: Ambiguous hole vs exterior logic

What happens when our FiftyOne dataset uses polylines, instead? In FiftyOne, a Polyline’s .points attribute is always a list, and if it contains more than one element then it refers to different closed polygons of the same object. I noticed this: when loading a COCO-formatted dataset that contains an RLE label that is a dense mask with a hole inside (and I use the use_polylines=True argument for importing), such a mask will be correctly exported when exporting to COCO (I didn't dig, but it's probably because the "filling" logic when creating dense masks from polygons already works out-of-the-box for polygons inside other polygons, because they are processed as "negative space"). But in the FiftyOne app, it appears as two polygons one inside the other. That is why I think this bug was deemed to only be related to the App rendering logic.

But in the end, my guess is that it is not a rendering problem, but it's simply related to how polylines are internally represented in FiftyOne. I explored interactively each sample and its polylines object, and I saw that there is no distinction in the FiftyOne polylines .points attribute that tells us which of those polygons are intended as holes (e.g. interior rings of other polygons) and which are not.

And now we come to the original problem. There is no clear distinction in the current YOLOv5 exporter logic either; everything is flattened with itertools.chain.from_iterable(). The current chaining approach may “accidentally” produce acceptable output for a single outer ring + single hole (if the hole ring has reverse order). This is because the chaining traverses the two rings and gives us a closed polygon that "mimics" a dense mask with a hole inside, like this:


Single shape with single hole, exported as YOLOv5

But such an approach fails in more complex arrangements (multiple disconnected regions, multiple holes), as shown here below:


COCO import with use_polylines=True

Exported as YOLOv5: polygons are (badly) chained together

Taking advantage of Shapely

At first I looked into the FiftyOne docs and I saw that every Polyline offers a to_shapely() method for converting such object into a shapely Polygon or MultiPolygon. So I thought this could be a good approach to think about a fix, since I knew Shapely supports the "hole logic". But unfortunately, digging deeper I discovered that the FiftyOne implementation of to_shapely() builds a MultiPolygon indiscriminately, without attempting to detect which rings are interior (holes) vs exterior, so it doesn’t reliably capture the intended geometry. This is the snippet I am talking about:

if len(points) == 1:
if _filled:
return sg.Polygon(points[0])
return sg.LineString(points[0])
if _filled:
return sg.MultiPolygon(list(zip(points, itertools.repeat(None))))

So by accident, I started looking into a bug and I noticed a lot more, so I am not even sure this should be limited to one issue post. But here I am. So let me know what you guys think.

System information

  • OS Platform and Distribution: Linux Ubuntu 22.04)
  • Python version: 3.10.12
  • FiftyOne version: 1.8.1
  • FiftyOne installed from: pip

Willingness to contribute

  • Yes. I can contribute a fix for this bug independently
  • Yes. I would be willing to contribute a fix for this bug with guidance
    from the FiftyOne community
  • No. I cannot contribute a bug fix at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugBug fixes

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions