just a guess, but I would say, large enough that a human should be able to recognize + a small safety buffer (for reliability / changing environment such as light or distance).
If deepstack is using something like CNN, it needs many ‘windows’ to sample, of all different sizes. So a 10x10 pixeled object might be the bare minimum (it can at least sample 1x1, 2x2, 3x3 windows across the image), but a 100x100 would be much much better (allowing many different window sizes to be sampled). 1000x1000 and higher may be diminishing returns, depending on the details of the object you require. or is it just a rough shape (dog, generic face/human) you are looking for?
think about most people using 1080p images, and trying to get an object to take up a fair portion of the screen, you will probably get better results than if it only takes up a very small portion of the screen. this is partly due to optics/focus, camera sensor noise, etc, but also in part with how much information (pixels) the machine learning algorithm can read and infer.
So i would say the minimum is somewhere between 10x10 and 100x100 pixels for the object, and where exactly might depend on the perfection of you image, and the complexity of the target object / what size (pixels) the object training dataset was.