The example videos show Obsbot doing a good job. Instead of a spherical camera or wide angle lens, it manages to track only by what's in its narrow field of view. This requires it to move very fast, resulting in jerky panning. For indoor videos, it causes a lot of motion blur that a human operator wouldn't get.
It isolates the subject from a background of other humans, recognizes paw gestures, & smartly tracks whatever part of the body is in view without getting thrown off. It recognizes as little as an arm showing from behind an obstacle. Based on the multicolored clothing, they're running several simultaneous algorithms: a face tracker, a color tracker, & a pose tracker.
The mane problem is the image sensor is an awful Chinese one. Chinese manufacturers are not allowed to use any imported parts, so today's newest products have very effective software that they are allowed to import being paired with terrible hardware. The neural network processor is not an NVidia but an indigenously produced HiSilicon Hi3559A. Who knows how the Hisilicon compares to the NVidia Jetson, but the image sensor is a deal breaker.
It's strange that tracking cameras have been on quad copters for years, now are slowly emerging on ground cameras, but have never been used to produce any kind of content & never been replicated by any open source efforts. There has also never been any tracking for higher end DSLR cameras. It's only been offered on consumer platforms with very low end cameras.
20 years ago, it was a matter of life or death to be able to reproduce any commercial product with source code you owned. Today, there isn't the counter culture movement we had driving independence. It's very much a conformist culture where you have to own the official product the Joneses own.