admin管理员组

文章数量:1123669

I’m working on extracting the coordinates of specific icons, folders, or elements from a desktop image using the Anthropic Vision API, but I’ve hit a roadblock. Here's what I'm doing:

I have a desktop image (original size: 1920x1080) with multiple icons and folders.

I’m asking the Vision model (via the Sonnet 3.5 model) to find the coordinates of a specific icon, like "Chrome."

Based on the Anthropic Vision documentation, I resized the image to 1366x768 before sending it to the API, as instructed.

Despite following the guidelines, the coordinates returned by the API don’t match the actual location of the icon in the original image. Interestingly, the computer-use model works correctly in its environment, but in this case, I just want to send the image to the Vision model and get the accurate coordinates of a specific element.

How to resolve the issue?

本文标签: pythonGetting Accurate Icon Coordinates Using Anthropic Vision APIStack Overflow