Evaluting Azure Video Indexer
In this article, I tried out Azure Video Indexer on a video that is similar to what farm video might contain. In an ideal scenario, Azure would identify key moments that correspond to states and actions that predict future rewards. For instance in this case the state could be thick grass on a terrace terrace, the action is plow and a new state might be 80% less green. Chain enough actions and states and eventually we get coffee, rice or vegetables which can be sold for a reward. RL will assign rewards to each action in the sequence.
If different actions are taken in different places, the system can learn actions that correlate with rewards. At that point randomized experiments could be ran to identify causation. If a coffee franchise were created that included a technology package, every action and state on a farm could be used to learn how actions impact our values.
If the Azure video stops working here is the Google version.
Ideal images to produce
0s: Shows the before state of the field
12s: Shows the action of plowing
24s: the 1st pass is completed
29s: shows the carabao action of eating
1:29: carabao starts plowing
1:53 carabao is eating and plowing
Scene 1
17s represents when the carabao passed the camera
38 is when the carabao passed Dhodie on the left
109 showed a different carabao
Scene 2
Covers when the camera transitioned between locations.
Scene 3
1:19 Shows waiting while eating
1:46 Transition from side of Dhodie to the back
2:16 Camera zooms in
Summary
Without domain specific guidance, a reasonable way to define key moments is when predictions of what is an image change quickly. Transitioning from the side to the back of a person might mean different datasets and models become activated.
Off-the-shelf, Azure Video Indexer doesn’t capture farm operations but could be helpful to organize training materials, or understand items such as construction.
Open source security software such as ispy seem better geared towards what I am trying to do.