AIniverse Get the App
Qwen: Qwen3 VL 32B Instruct logo

Qwen: Qwen3 VL 32B Instruct

Qwen

Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text comprehension, enabling fine-grained spatial reasoning, document and scene analysis, and long-horizon video understanding.Robust OCR in 32 languages, and enhanced multimodal fusion through Interleaved-MRoPE and DeepStack architectures. Optimized for agentic interaction and visual tool use, Qwen3-VL-32B delivers state-of-the-art performance for complex real-world multimodal tasks.

Parameters
32B
Context
131072
Modality
text+image->text
License
open_weights
Open source
Yes

Open Qwen: Qwen3 VL 32B Instruct in AIniverse

Compare versions, read real ratings, save to your stack.

Open in App

Other versions in the Qwen family

Qwen/QwQ-32B-AWQ
32B
Qwen/Qwen-7B-Chat
7B
Qwen/Qwen1.5-7B
7B
Qwen/Qwen1.5-MoE-A2.7B
2.7B
Qwen/Qwen2-0.5B
0.5B