Sber opens weights of two new flagship MoE models in the GigaChat series — Ultra Preview and Lightning — trained from scratch for Russian language tasks, along with the next generation of open speech recognition models GigaAM-v3 knowledgeable in punctuation and normalization.
Furthermore, all image and video generation models from the latest Kandinsky 5.0 family — Video Pro, Video Lite and Image Lite — are now publicly available. These advanced models provide native Russian prompt understanding, incorporate specific knowledge for Russian cultural context, and robustly generate Cyrillic text in both images and videos.
Additionally, the K-VAE 1.0 models for encoding and decoding visual content — critical for training visual generation neural networks and among the best open-source models globally — have been released. The code and weights for all these models are distributed under the MIT license, enabling commercial usage.
Andrey Belevtsev, Senior Vice President, Head of Technology & Al, Sberbank:
“We believe creating world-class artificial intelligence requires two things: massive resources and world class R&D teams. Sber has both. But what matters most is sharing—not locking down technology. Our strategy is to become an open foundation for innovation nationwide. That’s why we’re releasing model weights. This is a pivotal moment.
Any company in Russia, whether a bank or startup, can install these models within their closed systems, fine-tune them on sensitive internal datasets, and retain complete control over their confidential information.
This approach reflects true technological sovereignty: AI belongs to the entire nation, driving business transformations and economic growth. I would also like to note that Ultra will be soon available for corporate clients, with optimized cost of ownership for internal corporate deployments.”
GigaChat Ultra and GigaChat Lightning GigaChat expands with the addition of GigaChat Ultra Preview and GigaChat Lightning. GigaChat Ultra Preview stands out as the largest and most powerful model in the GigaChat line-up.
The first model of this scale in Russia, though still being trained, it already surpasses international benchmarks such as DeepSeek V3.1 in overall quality metrics for Russian-language performance, ranking first on the MERA benchmark. Despite its size, it maintains impressive speed, currently faster than GigaChat 2 Max the previous flagship model.
Because we are making GigaChat Ultra Preview freely available, developers gain the ability to fine-tune the model offline. For example, inside secure corporate environments where strict data privacy controls and data quality are critical.
Its sibling, GigaChat Lightning, offers the opposite balance: compact size and fast operation in an MoE model optimized for local execution on laptops while supporting rapid product iterations.
Quality-wise, GigaChat Lightning competes globally among open-source leaders: it outperforms Qwen3-4B in Russian-language tasks and matches its dialogue capabilities, document analysis, and business application solutions.
Like GigaChat Ultra, we publish not just the model weights but also the inference acceleration techniques. GigaChat Lightning exceeds competitors in its class. It runs nearly as fast as Qwen3-1.7B despite being six times larger.
Both models
integrate external tools effectively, particularly highlighting two core features: code and memory.
• Code is a tool for executing, analyzing, and visualizing programmatic operations. It enables running code snippets, plotting graphs, performing calculations, and testing hypotheses in real-time.
• Memory is a system for personalized communication, retaining important details such as objectives, preferences, and conversation histories. Models offer users personalized advice and adjust information throughout dialogues. Outdated or sensitive data gets automatically deleted, and users can manually edit model memories.
GigaAM-v3
GigaAM-v3 represents five new open-source Automatic Speech Recognition (ASR) models designed for industrial-grade and commercial-use Russian speech processing. GigaAM-v3 supports voice assistants, contact centers, call analytics, voice message aggregators, and multimodal agents. In the new version of the GigaAM acoustic models, pre-training scales from 50,000 hours to 700,000 hours of audio.
The addition of punctuation and normalization support allows the model to compete on equal terms with OpenAI Whisper, while significantly surpassing it in terms of recognition quality.
Based on the unique foundational GigaAM-v3 model, any speech technologies can be implemented: at Sber, it is already utilized for speech recognition, speech synthesis, and enables GigaChat to process video and audio.
Kandinsky 5.0
Kandinsky 5.0 is a versatile family of visual generative models: Image Lite generates high-quality images from text prompt and supports image editing, while Video Lite and more advanced Video Pro generate video from text prompts or animate images.
Image Lite model generates highly detailed images in HD resolution, demonstrates deep understanding of Russian cultural context, natively supports both Russian and English prompts, and can generate Latin and Cyrillic text. Video Pro model produces up to 10 seconds long HD video at 24 fps and currently leads the global open-source surpassing Wan-2.2-A14B and achieving visual quality comparable with Veo 3, one of the strongest proprietary models worldwide. For seamless integration into applied projects, the Video Lite version was released and optimized to run on consumer-level GPUs with at least 12 GB of RAM.
Development of the Kandinsky 5.0 family required training on one billion images and 300 million videos, supplemented by over one million additional multimedia materials to ensure strong alignment with local cultural context. Processing datasets of this scale demanded cutting-edge methodologies, including several techniques developed specifically for the project. The final training stage used a high-quality dataset prepared by professional designers and artists to ensure perfect composition, style, and overall visual quality.
Kandinsky 5.0 unlocks new opportunities for consumer and enterprise applications. Developers and organizations can leverage these open-access models to build tools for personalized video greetings, photo animation, and rich visual storytelling. Creative professionals including directors, designers, marketers, and animation artists can rely on Kandinsky to streamline the creation of promotional materials, digital content, and commercial visual projects. The release of Kandinsky 5.0 marks a significant milestone in the growth of an open ecosystem centered around modern Russian generative technologies, empowering users and businesses with accessible, high-quality AI-driven creative tools.
Read more in this report.
K-VAE 1.0
Generative models like Kandinsky 5.0 create media content in latent spaces - invisible to the human eye. Working in these hidden representations enables faster, more lightweight, and highly scalable training and deployment. Sber is now introducing iits proprietary trained-from-scratch autoencoder models K-VAE 1.0 for images (2D) and videos (3D) that transform visual data into latent representations and reconstruct it back with exceptional fidelity. K-VAE 1.0 models are the world’s best among open-source equivalents. Their public availability will raise generative AI technologies to a new level of quality.