How to read coordinates in object recognition

I want to filter some objects based on coordinates. However I’m confused about how to read the coordinates. For example I’m getting this message from DeepStack


The object ( car ) is in the upper left corner with the front of the car touching the edge of the image. In this regard x_min=1 makes sense.
x_max = 115 and I can see that I can fit about 8 cars of same size within the picture frame.

8x115 = 920 (very rough approximation)

Is the real scale of the x axis 1000, or 1024 or something else.

Y axis (vertical) is much more confusing. The only way to make sense of it is if the zero coordinates are top left (not bottom left as usual). Here again using my approximation method I reach the conclusion that Y coordinates scale should be between 1,000 and 1,500.

What is true? How does DeepStack calculate coordinates?

P.S. My camera that provides the images is 5K camera with 5120 × 2880 resolution, that’s why I exclude the possibility that coordinates are linked to the pixels.