How to read coordinates in object recognition

I want to filter some objects based on coordinates. However I’m confused about how to read the coordinates. For example I’m getting this message from DeepStack

{“success”:true,“predictions”:[{“confidence”:0.7907504,“label”:“car”,“y_min”:135,“x_min”:1,“y_max”:208,“x_max”:115}],“duration”:0}

The object ( car ) is in the upper left corner with the front of the car touching the edge of the image. In this regard x_min=1 makes sense.
x_max = 115 and I can see that I can fit about 8 cars of same size within the picture frame.

8x115 = 920 (very rough approximation)

Is the real scale of the x axis 1000, or 1024 or something else.

Y axis (vertical) is much more confusing. The only way to make sense of it is if the zero coordinates are top left (not bottom left as usual). Here again using my approximation method I reach the conclusion that Y coordinates scale should be between 1,000 and 1,500.

What is true? How does DeepStack calculate coordinates?

P.S. My camera that provides the images is 5K camera with 5120 × 2880 resolution, that’s why I exclude the possibility that coordinates are linked to the pixels.

No no. There is just a binding to pixels, but with a certain coefficient. Depending on various factors, such as what hardware DeepStack is running on, what level of recognition quality, the resolution depends. My DeepStack runs at 640x640, in your case that means 5120x2880 will shrink to 640x640. Then we find the compression ratio horizontally and vertically. 5120/640=8, 2880/640=4.5.
And now when DeepStack tells you “y_min”: 135, you must apply a factor: 135х4.5. And for “x_min”: 1, 1х8=8. In exactly the same way, you need to recalculate for all coordinates, and you will get a binding to the real resolution.
But I believe that DeepStack itself should deal with this. It must know what resolution it is compressing at the input, and must bind the coordinates not to the internal resolution, but to what is on the input.