Looking for approach for detecting new objects over a period of time

I am trying to use Deepstack to alert me to new objects within a field of view of a camera. One case is detecting cars in my driveway and street, and I am currently taking a picture once a minute and sending it to Deepstack and keeping track of how many cars are in the new picture verses the previous one.

I’ve run into a few problems with this, first is that fairly often maybe due to light changes a car will be detected (and I get an alert), then for one or more images after won’t be detected, then will get detected again and I got a new alert even though the car was there the whole time. I’m also assuming that even if a car didn’t move between two images it might be detected a few pixels shifted.

And I also know in one image I had a car on the far left side, then before the next picture that car drove off and a new car parked on the right side I wouldn’t notice it because the number of cars in images remained the same.

I’m looking for a better approach. My idea is to keep track of where a car recently was even if it’s not detected for some amount of time and not consider the car new if it is detected there again within the time period. I was considering using the center of the bounding box as the location, but I also know I have to allow some fuzziness in matching that location. One approach would simply be to divide the image into a bigger grid (maybe 10x10 squares) and any point in each square would be considered the same, but that runs the risk of detected points near the edge of one square shifting just enough to move to an adjacent square. Taking the x,y coordinate of a detected car in a new image and calculating the distance to the x,y of previously detected cars just sounds expensive to me even though the number of cars would hardly ever be more than 5, so I’m thinking some kind of hashed location approach would be better.

I’ve been studying the Redis server recently and its geospacial features sounded kinda like what I’m thinking of, but they’re of course using long/lat and I’m wondering if there is something similar but just within the x,y of an image.

Does anyone have suggestions?