Simple Idea to Improve AI

This is a simple idea that takes care of things like chickens and small dogs being mistaken for a person at 80+% confidence. Chickens in my backyard are constantly being mistaken for people at a 80+% confidence. My Chihuahua was mistaken today at 82%. There is an easy fix to this. Because these animals are greatly shorter than people, the AI algorithm could take the height of the object into consideration. Nearly 100% of cameras are going to have a perspective where things at the bottom of the image are closer than things at the top of the image. Either through some user input or AI, Deepstack could have an understanding of the distance of objects as they go from the bottom of the screen to the top of the screen. User input could work by a user tagging spots on the image & typing in the distance. AI could figure it out by taking educated guesses as it identifies objects of known size at very high confidence levels (ie 97+%) and estimates the distance as it discovers the objects at different places in the image. Three known distances in an image would probably be enough to make calculations that can tell the difference between 5 feet and 12 inches. Even though people vary in size, the AI could use 5’ to get it close enough to tell the difference between a person and an animal that is 12 inches tall. If the AI suspects it has identified a person it would not take long to estimate the identified objects height based on where it’s at in the picture. This would cut down on crazy mistakes like a 10" high dog being mistaken for a person.

The idea is great. More than 5 years ago I was at the Bosch CCTV Seminar. IP Bosch cameras have an accelerometer, they know at what angle to the horizon they are shooting. And they have a calibration of the size of the object in the frame. Thus, after calibration, the camera can fairly accurately determine the size of objects in the frame. It is even possible to set an alert if a person of a certain height appears in the frame, or a truck is longer than the allowed parking space, etc. It would be great to have such functionality in DeepStack.
After we know the geometry in the frame, we will be able to determine objects by size, distance to objects, speed, and much more can be calculated with mathematics.
It is also a good idea to define the color of objects. At least during the day, with good lighting. And then we can easily find in the video the same red car, or the man in green, the white dog, not the black one.

Hello. I keep reading sugestions that has nothing to do or implement in DeepStack. The comment of the OP can be archived via a AI Tool - Vorlon Fork, etc. The objects detected by DeepStack generate a bounding box with Height and Width. So if per every camera you can set or calibrate an minimum Height and Width a car, person, truck, etc can have in specfic areas of the camera you have the problem solved. Just ask this feature to corresponding software maintainer.

1 Like

DeepStack is the AI that identifies objects. When it’s identifying chickens as people, that has 100% to do with the way the AI in DeepStack is interpreting the object. If the AI just knew a little tiny bit about the geometry of the image, it could use some extremely light algorithms to determine all kinds of things (including Height in Feet).

If DeepStack knows the height & width of an object, why can’t it distinguish a ONE foot high chicken from a person. If it knew the height of the object, it would just take an incredibly simple algorithm to say “hmmm, is the height less than two feet high? If so, must not be a person”. I can’t tell you the number of times that DeepStack analyzed the Object (a “chicken”) in the bounding box and came back with “yeah, I’m damn near positive (85%+) that’s a person.” If it knew the Height in feet, it would not make this mistake. I suspect it knows the height in pixels which, by itself, says nothing about the objects real-life height.

Also by knowing the height (in feet or meters), you could make the AI more efficient. If the object is only one foot high, no need to run a bunch of complicated algorithms to determine if it’s a person or a car.

Or a better option would be, train your own custom dataset and include your actual cameras view as well as objects/animals/people, and that problem will go away I’m sure.