c++ - Running tflm on bare metal and experiencing an output tensor issue - Stack Overflow-软件玩家

admin管理员组
文章数量:1353311

I have my own custom trained tflite model that I trained and converted to a C++ source file. As part of the project, it is meant to run on a Zync-7000 and I am doing it with a bare metal implementation. I been able to successfully run the model and I know because the detection scores all match up with the output of the model when I test it on python; however, the bounding boxes are slightly off.

Board results:

DEBUG: Detection 0 raw: score=0.8013, class=0.2714, box=[0.1784, 0.1757, 0.0000, 0.0000]
DEBUG: Detection 1 raw: score=0.5003, class=0.2192, box=[0.0000, 0.0000, 0.9921, 0.4017]
DEBUG: Detection 2 raw: score=0.3077, class=0.1971, box=[0.0039, 0.0133, 0.9910, 0.1075]
DEBUG: Detection 3 raw: score=0.2803, class=0.1810, box=[0.9604, 0.0426, 0.9986, 0.0785]
DEBUG: Detection 4 raw: score=0.2714, class=0.1784, box=[0.9536, 0.0270, 0.9939, 0.0553]
DEBUG: Detection 5 raw: score=0.2192, class=0.1757, box=[0.0408, 0.0082, 1.0023, 0.2196]
DEBUG: Detection 6 raw: score=0.1971, class=0.0000, box=[0.7465, 0.1062, 1.0030, 0.4046]
DEBUG: Detection 7 raw: score=0.1810, class=0.0000, box=[0.5294, 0.0006, 0.6696, 0.0322]
DEBUG: Detection 8 raw: score=0.1784, class=0.0000, box=[0.1593, 0.4119, 0.5908, 0.5678]
DEBUG: Detection 9 raw: score=0.1757, class=0.0000, box=[0.4822, 0.0018, 0.6665, 0.0422]

Python results (sorry for the different format):

arr0_(scores) = [[0.8013255  0.5003249  0.30769435 0.28034142 0.27137667 0.219202
  0.19714184 0.1809863  0.17840956 0.17565072]]
arr1_(boxes) = [[
  [0.1093 0.2193 1.0176 0.7862]
  [0.7530 0.2164 0.9921 0.4017]
  [0.0039 0.0133 0.9910 0.1075]
  [0.9604 0.0426 0.9986 0.0785]
  [0.9536 0.0270 0.9939 0.0553]
  [0.0408 0.0082 1.0023 0.2196]
  [0.7465 0.1062 1.0030 0.4046]
  [0.5294 0.0006 0.6696 0.0322]
  [0.1593 0.4119 0.5908 0.5678]
  [0.4822 0.0018 0.6665 0.0422]
]]
arr2_num_detections = [10.]
arr3_class = [[0. 3. 0. 0. 0. 0. 0. 0. 0. 0.]]

As you can see, the output for the scores is the same, but the first two bounding box values (the first 6 coordinates) differ and the rest are correct.

I tracked it down to the fact that the outputs do not seem to be getting enough of a buffer to output into:

tensor_score->data.f: 0x2654670
tensor_boxes->data.f: 0x2654690
tensor_count->data.f: 0x2654660
tensor_class->data.f: 0x2654680

I am basing that assumption from the output details I got of the model when running in python.

interpreter = tf.lite.Interpreter(model_path=TFLITE_MODEL_PATH)
interpreter.allocate_tensors()

output_details = interpreter.get_output_details()

for i, detail in enumerate(output_details):
    print(f"Output {i}: Name={detail['name']}, Shape={detail['shape']}, Type={detail['dtype']}")

Which will output the following:

Output 0: Name=StatefulPartitionedCall:1, Shape=[ 1 10], Type=<class 'numpy.float32'>
Output 1: Name=StatefulPartitionedCall:3, Shape=[ 1 10  4], Type=<class 'numpy.float32'>
Output 2: Name=StatefulPartitionedCall:0, Shape=[1], Type=<class 'numpy.float32'>
Output 3: Name=StatefulPartitionedCall:2, Shape=[ 1 10], Type=<class 'numpy.float32'>

These shapes give an element count of 10, 40, 1, and 10 respectively. And with their type being float32, meaning 4 bytes for each element, that would require at least 40, 160, 4, and 40 bytes. And if you look at the output for the class in the board results I think you can see an example of the overlapping as the first 6 values of the class match the last 6 values of score and the first two bounding box coordinates match the last two values of score as well.

Has anyone encountered anything like this while trying to work with tflm?

I have tried hard coding values just to test, but I am struggling to figure out where to go.

Board results:

DEBUG: Detection 0 raw: score=0.8013, class=0.2714, box=[0.1784, 0.1757, 0.0000, 0.0000]
DEBUG: Detection 1 raw: score=0.5003, class=0.2192, box=[0.0000, 0.0000, 0.9921, 0.4017]
DEBUG: Detection 2 raw: score=0.3077, class=0.1971, box=[0.0039, 0.0133, 0.9910, 0.1075]
DEBUG: Detection 3 raw: score=0.2803, class=0.1810, box=[0.9604, 0.0426, 0.9986, 0.0785]
DEBUG: Detection 4 raw: score=0.2714, class=0.1784, box=[0.9536, 0.0270, 0.9939, 0.0553]
DEBUG: Detection 5 raw: score=0.2192, class=0.1757, box=[0.0408, 0.0082, 1.0023, 0.2196]
DEBUG: Detection 6 raw: score=0.1971, class=0.0000, box=[0.7465, 0.1062, 1.0030, 0.4046]
DEBUG: Detection 7 raw: score=0.1810, class=0.0000, box=[0.5294, 0.0006, 0.6696, 0.0322]
DEBUG: Detection 8 raw: score=0.1784, class=0.0000, box=[0.1593, 0.4119, 0.5908, 0.5678]
DEBUG: Detection 9 raw: score=0.1757, class=0.0000, box=[0.4822, 0.0018, 0.6665, 0.0422]

Python results (sorry for the different format):

arr0_(scores) = [[0.8013255  0.5003249  0.30769435 0.28034142 0.27137667 0.219202
  0.19714184 0.1809863  0.17840956 0.17565072]]
arr1_(boxes) = [[
  [0.1093 0.2193 1.0176 0.7862]
  [0.7530 0.2164 0.9921 0.4017]
  [0.0039 0.0133 0.9910 0.1075]
  [0.9604 0.0426 0.9986 0.0785]
  [0.9536 0.0270 0.9939 0.0553]
  [0.0408 0.0082 1.0023 0.2196]
  [0.7465 0.1062 1.0030 0.4046]
  [0.5294 0.0006 0.6696 0.0322]
  [0.1593 0.4119 0.5908 0.5678]
  [0.4822 0.0018 0.6665 0.0422]
]]
arr2_num_detections = [10.]
arr3_class = [[0. 3. 0. 0. 0. 0. 0. 0. 0. 0.]]

As you can see, the output for the scores is the same, but the first two bounding box values (the first 6 coordinates) differ and the rest are correct.

I tracked it down to the fact that the outputs do not seem to be getting enough of a buffer to output into:

tensor_score->data.f: 0x2654670
tensor_boxes->data.f: 0x2654690
tensor_count->data.f: 0x2654660
tensor_class->data.f: 0x2654680

I am basing that assumption from the output details I got of the model when running in python.

interpreter = tf.lite.Interpreter(model_path=TFLITE_MODEL_PATH)
interpreter.allocate_tensors()

output_details = interpreter.get_output_details()

for i, detail in enumerate(output_details):
    print(f"Output {i}: Name={detail['name']}, Shape={detail['shape']}, Type={detail['dtype']}")

Which will output the following:

Output 0: Name=StatefulPartitionedCall:1, Shape=[ 1 10], Type=<class 'numpy.float32'>
Output 1: Name=StatefulPartitionedCall:3, Shape=[ 1 10  4], Type=<class 'numpy.float32'>
Output 2: Name=StatefulPartitionedCall:0, Shape=[1], Type=<class 'numpy.float32'>
Output 3: Name=StatefulPartitionedCall:2, Shape=[ 1 10], Type=<class 'numpy.float32'>

Has anyone encountered anything like this while trying to work with tflm?

I have tried hard coding values just to test, but I am struggling to figure out where to go.

Share Improve this question edited Apr 3 at 0:28 old_timer 71.7k9 gold badges99 silver badges176 bronze badges asked Apr 1 at 2:06 caleb losch 192 bronze badges New contributor caleb losch is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.

Only the first two bounding boxes differ, if you match the formatting. I edited your post for you. We don't have enough information about the model itself to confirm your suspicion that "the outputs do not seem to be getting enough of a buffer to output into". 0x10 bytes (16) is enough to fit four floats, which is exactly one box... – Botje Commented Apr 1 at 7:36
I added some more information about the characteristics of the model, like the shape of the output tensors, and just some further background for my assumptions. – caleb losch Commented Apr 1 at 12:29
Okay, that seems to confirm your assessment .. Now add the tflite code that loads and uses the model? – Botje Commented Apr 1 at 12:35
1 I have since figured out the issue. It seems like tflite likes to leave the shape of the output tensors as empty/uninitialized and will dynamically resize them when the model is invoked. For tflm implementations, it won't do this and will statically allocate based on whatever the model's metadata specified. I just manually altered the model's metadata using the flatbuffer compiler and then recompiled back into a .tflite format. It worked from there. – caleb losch Commented Apr 2 at 1:46
Good find! Suggest you write that as a self-answer for others to find. – Botje Commented Apr 2 at 6:37

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

The issue was due to a difference between general tflite implementations and specifically tflm implementations. TfLite will not specify the dimensions of the output tensors prior to the model being invoked and instead relies on dynamically allocating the necessary space when the model is invoked. TFLM does not support dyanmic allocation and instead relies on the predefined dimensions from the metadata of the tflite model to statically allocate. I used netron.app to determine that this metadata was missing. I used the flatbuffer compiler to convert the .tflite file to a .json file where I could see and manipulate the metadata:

.\flatc.exe -t --strict-json --defaults-json -o . schema.fbs -- model2.tflite

I added the missing dimensions to the output tensors and then recompiled from the json back into a .tflite file:

flatc -b --defaults-json -o new_model schema.fbs model2.json

Make sure to have all the proper file paths, I put all of mine in the same folder.

本文标签： cRunning tflm on bare metal and experiencing an output tensor issueStack Overflow

版权声明：本文标题：c++ - Running tflm on bare metal and experiencing an output tensor issue - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1743912569a2560662.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

c++ - Running tflm on bare metal and experiencing an output tensor issue - Stack Overflow

1 Answer 1

更多相关文章