I did not update this thread, but I wrote a program that scraped the log output of BI which does indeed contain the coordinates of when a ‘person’ is found. You can then use those coordinates to move the camera.
Remember this approach contained no zooming.
My approach was simple. Create 9 presets. When motion is detected in any of the 9 zones, center the camera on that zone. This of course would only be effective on slow moving objects.
However, I ran into a math problem as the coordinates contain the size of the object as well, but reverse engineering that into useable info exceeded my limit of interest. For the panning effect to have any value, the size of the object (‘person’) found, matters. Math problems make me sleepy.
It was a fun adventure, but I am too lazy to take it to a useful level.
The approach in post #2 above requires multiple cameras, which is exactly what i was trying to avoid, but it seems like a viable option.
Note your idea of simply panning the camera a “small increment” is good, but realize that by the time the AI coordinates hit the log file and you parse that info, too much time has passed for that info to be actionable. Meaning that such an approach would require a more native solution that is possibly on the cameras software itself (Which is how those auto tracking PTZ’s are sold, right?)
This is why I was proposing to break the view down into 9 zones of view. For a slow moving object, it takes a small amount of time for them to move from one zone to another. This small amount of time introduced by this approach is critical as you have to somehow get the coordinates, move the camera, then read the scene again, then keep repeating. This activity is not instantaneous. Since my camera is an overview camera covering a large area, I felt that it met the basic requirements.