AIniverse Get the App
Meta: Llama 3.2 11B Vision Instruct logo

Meta: Llama 3.2 11B Vision Instruct

Meta

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it performs well in complex, high-accuracy image analysis. Its ability to integrate visual understanding with language processing makes it an ideal solution for industries requiring comprehensive visual-linguistic AI applications, such as content creation, AI-driven customer service, and research. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).

Parameters
11B
Context
131072
Modality
text+image->text
License
open_weights
Open source
Yes

Open Meta: Llama 3.2 11B Vision Instruct in AIniverse

Compare versions, read real ratings, save to your stack.

Open in App

Other versions in the Llama family

Llama Guard 3 8B
8B
Meta: Llama 3 70B Instruct
70B
Meta: Llama 3 8B Instruct
8B
Meta: Llama 3.1 70B Instruct
70B
Meta: Llama 3.1 8B Instruct
8B