It wouldn’t be too hard to train. There are enough audio models and computer vision models that could be trained in parallel on video clips that have recorded sound to train what sound profiles are associated with what events in the frame.
The real fun one would be to figure out how to train an AI to understand sounds originating from out of frame.
It won’t be long before we have something like Oobabooga or Stable Diffusion but for artificial video with matching audio. I’m so sorry for historians in the future trying to determine whether this video of Joe Biden doing tap dances on an F-16 while throwing a hadouken recovered from a trashed hard drive is authentic or not.
The chair video feels like an SCP. I love that they included some failed generations for some nightmare fuel.
This is such a great example of the impressiveness and flaws of this tech.
Look at the weird non corporal chair everyone, might forget the people interacting with it are ai generated too
Imagine if the AI can add sound to the video, it would be fucking nuts.
It wouldn’t be too hard to train. There are enough audio models and computer vision models that could be trained in parallel on video clips that have recorded sound to train what sound profiles are associated with what events in the frame.
The real fun one would be to figure out how to train an AI to understand sounds originating from out of frame.
It won’t be long before we have something like Oobabooga or Stable Diffusion but for artificial video with matching audio. I’m so sorry for historians in the future trying to determine whether this video of Joe Biden doing tap dances on an F-16 while throwing a hadouken recovered from a trashed hard drive is authentic or not.