AI model trained on video, images, and audio, enabling seamless reasoning across modalities.
Google Gemini can analyze a video of a person speaking and identify their emotion, then use that information to generate a more accurate transcription of their words.