What is the minimum image size in pixels?

I know the distance to the place where the object of observation will appear. I am trying to design a detection and recognition system correctly. I need to choose the optics (focal length) and the resolution of the matrix.
I need to understand how many pixels are needed DeepStack for successful:

  • Face Detection,
  • Face Recognition,
  • Object Detection (Human, Dog, Car)
    at confidence over 95%
    I’m not interested in the overall resolution of the frame, which will be sent for analysis to DeepStack.
    I need to know what the size of an object in pixels must be for successful detection and recognition.
1 Like

just a guess, but I would say, large enough that a human should be able to recognize + a small safety buffer (for reliability / changing environment such as light or distance).

If deepstack is using something like CNN, it needs many ‘windows’ to sample, of all different sizes. So a 10x10 pixeled object might be the bare minimum (it can at least sample 1x1, 2x2, 3x3 windows across the image), but a 100x100 would be much much better (allowing many different window sizes to be sampled). 1000x1000 and higher may be diminishing returns, depending on the details of the object you require. or is it just a rough shape (dog, generic face/human) you are looking for?

think about most people using 1080p images, and trying to get an object to take up a fair portion of the screen, you will probably get better results than if it only takes up a very small portion of the screen. this is partly due to optics/focus, camera sensor noise, etc, but also in part with how much information (pixels) the machine learning algorithm can read and infer.

So i would say the minimum is somewhere between 10x10 and 100x100 pixels for the object, and where exactly might depend on the perfection of you image, and the complexity of the target object / what size (pixels) the object training dataset was.

1 Like

Your image will be shrunk down to the size of the model, typically 416x416. How many of these pixels are needed to identify an object varies depending on the difficulty of the object. But typically you need 10x10 at least.

You can increase the size of your model, which gives it more pixels. I run something like 1392x688. Yolo models can be reshaped without retraining, typically. Accuracy is improved both by the size of the model and the number of layers. For small and medium models your layers are reduced I depth. If you double the size of your model detection from 416x416 to 832x832 you will have four times the false positives. So then you must increase your accuracy threshold.

Imagine trying to find golf balls from satellite imagery of a golf course. A simple model could identify a single white pixel as a golf ball, but you will have many false positives. So you need more pixels, not to figure out if it’s a golf ball, but to rule out if it’s something else.

I have 8mp 2.8mm and 4mp 6.0mm cameras. The 6.0mm cameras perform far better. Both might see a deer, but the 6.0mm sees many more pixels for the same deer and can have a high confidence.

1 Like

This means that whatever resolution I set, Deepstack will compress the picture? I have a Jetson Nano 2Gb. Up to what resolution does Docker Deepstack compress the picture for Jetson Nano? 400x400? I noticed that if you give a picture in 8MP resolution, then the processing takes ~0.7s. If you give 480x800, then ~0,15s. Does this mean that resizing the image takes ~0.55s? If you immediately give an image of the required size to the Deepstack input, then the image size will not be resized, and this will be the maximum performance?
I correctly understood that it makes no sense for Deepstack to take an expensive camera with a high image resolution, an HD format camera will do, and about 10 fps is enough?
P.S. Speaking about the minimum number of pixels of an object for its recognition, I hoped to get clear information, for example, as it is determined when recognizing license plates, where it is indicated that recognition will occur 100% if there are 10x7 pixels per digit.

Also see https://github.com/robmarkcole/HASS-Deepstack-object/discussions/241

1 Like