The web page introduces a Vision model with 1.6B parameters built using SigLIP, Phi-1.5 and the LLaVA training dataset. The model weights are licensed under CC-BY-SA due to using the LLaVA dataset. The model can be tried out on Hugging Face Spaces.

Vikhyat
Not Applicable
March 3, 2024
Moondream: Tiny Vision Language Model on GitHub