GLM

General Language Model

GLM 5 with DAS (
DeepSeek Sparse Attention)

Interleaved Thinking

Thinks before each action/tool call

Preserved Thinking

Maintains previous reasoning (stable for long-term tasks)

Turn-level Thinking

Reasoning on/off based on request difficulty

arxiv.org

https://arxiv.org/pdf/2602.15763

Zhipu

Z.ai Chat - Free AI powered by GLM-4.7 & GLM-4.6

Chat with Z.ai's free AI to build websites, create presentations, and write professionally. Fast, smart, and reliable, powered by GLM-4.7.

https://chat.z.ai/

2021

GLM: General Language Model Pretraining with Autoregressive Blank Infilling

There have been various types of pretraining architectures including autoencoding models (e.g., BERT), autoregressive models (e.g., GPT), and encoder-decoder models (e.g., T5). However, none of...

https://arxiv.org/abs/2103.10360

4.7

zai-org/GLM-4.7 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

https://huggingface.co/zai-org/GLM-4.7

Image generation

A discrete auto-regressive based image generation model that combines an AR generator with a diffusion decoder in a hybrid architecture. The AR part (9B) is initialized from GLM-4-9B-0414 and jointly trained on text-to-image and image-to-image tasks, using semantic-VQ tokens (from the X-Omni tokenizer family) to enhance controllability/semantic correlation. It also improves quality at high resolutions through progressive generation (low-resolution 256-token layout → high-resolution tokens) and weight adjustments.

The diffusion decoder (7B) uses a CogView4-style single-stream DiT with flow matching, taking semantic-VQ tokens generated by the AR model as conditions to restore and refine high-frequency details. Glyph-byT5 is additionally used to enhance text rendering. For editing, VAE latents from the reference are also provided as conditions, with block-causal attention reducing computational costs. Post-training separates AR and decoder optimization using GRPO/flow-GRPO respectively: AR focuses on semantic alignment and aesthetics (OCR, VLM, HPSv3, etc.), while the decoder enhances details (LPIPS, OCR, hand scoring, etc.).

GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image Generation

Today we are excited to introduce GLM-Image, the first open-source, industrial-grade discrete auto-regressive image generation model. GLM-Image adopts a hybrid architecture combining an auto-regressive module with a diffusion decoder. The auto-regressive part is partially based on, and initialized from, [GLM-4-9B-0414][1] with 9 billion parameters, while the diffusion decoder follows [CogView4][2] to adopt a single-stream DiT structure with 7 billion parameters. In general image generation quality, GLM-Image aligns with mainstream latent diffusion approaches, but it shows significant advantages in text-rendering and knowledge-intensive generation scenarios. It performs especially well in tasks requiring precise semantic understanding and complex information expression, while maintaining strong capabilities in high-fidelity and fine-grained detail generation. In addition to text-to-image generation, GLM-Image also supports a rich set of image-to-image tasks including image editing, style transfer, identity-preserving generation, and multi-subject consistency.