admin管理员组

文章数量:1345064

I'm working on implementing image segmentation using my own custom TFLite model, following the code example from MediaPipe. Here's my code:

options = vision.ImageSegmenterOptions(
    base_options=base_options,
    running_mode=mp.tasks.vision.RunningMode.IMAGE,
    output_confidence_masks=True,
    output_category_mask=False
)

mp_image = mp.Image.create_from_file(image_path)
with vision.ImageSegmenter.create_from_options(options) as segmenter:
    segmentation_result = segmenter.segment(mp_image)
    output_mask = segmentation_result.confidence_masks[0]

I've encountered two issues with the above code:

  1. The model has two outputs:

    Output 0: Name = Identity0, Shape = [1, 1], Type = numpy.float32

    Output 1: Name = Identity1, Shape = [1, x, y, z], Type = numpy.float32 (where x * y * z == image_width * image_height * image_channel=1)

    How can I retrieve both outputs instead of just one?

  2. The confidence_masks values are almost identical (min/max = 0.0701157/0.070115715), which seems unusual. The original image contains a person, and the output is correct when using my custom TFLite model with tf.lite.Interpreter.get_tensor().

I know that many frameworks support models with multiple inputs and outputs, so I'm confused about what I might be missing. Here are my specific questions:

  1. Do I need to add special metadata to the TFLite model file?
  2. How should I modify the original MediaPipe code to handle multiple outputs?

I'm working on implementing image segmentation using my own custom TFLite model, following the code example from MediaPipe. Here's my code:

options = vision.ImageSegmenterOptions(
    base_options=base_options,
    running_mode=mp.tasks.vision.RunningMode.IMAGE,
    output_confidence_masks=True,
    output_category_mask=False
)

mp_image = mp.Image.create_from_file(image_path)
with vision.ImageSegmenter.create_from_options(options) as segmenter:
    segmentation_result = segmenter.segment(mp_image)
    output_mask = segmentation_result.confidence_masks[0]

I've encountered two issues with the above code:

  1. The model has two outputs:

    Output 0: Name = Identity0, Shape = [1, 1], Type = numpy.float32

    Output 1: Name = Identity1, Shape = [1, x, y, z], Type = numpy.float32 (where x * y * z == image_width * image_height * image_channel=1)

    How can I retrieve both outputs instead of just one?

  2. The confidence_masks values are almost identical (min/max = 0.0701157/0.070115715), which seems unusual. The original image contains a person, and the output is correct when using my custom TFLite model with tf.lite.Interpreter.get_tensor().

I know that many frameworks support models with multiple inputs and outputs, so I'm confused about what I might be missing. Here are my specific questions:

  1. Do I need to add special metadata to the TFLite model file?
  2. How should I modify the original MediaPipe code to handle multiple outputs?
Share Improve this question edited 14 hours ago lcljesse asked yesterday lcljesselcljesse 855 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

Why do you have output_category_mask=False and are expecting 2 outputs ? You are specifically asking the model to only return 1 output.

Please check the documentation and source code.

output_confidence_masks: Whether to output confidence.

output_category_mask: Whether to output category mask.

本文标签: pythonmediapipe Image Segmentation with DualOutput TFLite ModelStack Overflow