Frontier Open-Source LLM Model Weights: Storage Sizes
Date: 2026-03-16
Purpose: Comprehensive reference for download/storage sizes of major open-source model families
Summary
The largest open-source models now exceed 800 GB for full-precision weights. MoE (Mixture-of-Experts) architectures dominate the frontier, with total parameter counts far exceeding active parameters per token. BF16 (bfloat16, 2 bytes per parameter) is the standard full-precision format on HuggingFace.
Quick formula: For dense models, BF16 size in GB ~= parameters in billions x 2. For MoE models, all expert weights are stored even though only a subset activates per token.
Master Table
| Model | Params (Total) | Active Params | Architecture | BF16 Size (GB) | Notes |
|---|---|---|---|---|---|
| GLM-4 Family (THUDM/Zhipu) | |||||
| GLM-4-9B | 9.4B | 9.4B (dense) | Dense | ~18 GB | Base + Chat variants |
| GLM-4-32B-0414 | 32B | 32B (dense) | Dense | ~65 GB | Released Apr 2025 |
| GLM-4.5 (zai-org) | Unknown | Unknown | Unknown | Unknown | Community upload; limited info |
| MiniMax | |||||
| MiniMax-Text-01 | 456B | 45.9B | MoE (32 experts, top-2) | ~495 GB | 413 safetensor shards; Lightning Attention hybrid |
| MiniMax-VL-01 | 456B | 45.9B | MoE + Vision | ~500 GB | Vision-language variant |
| MiniMax-M2.5 | 230B | 10B | MoE | ~457 GB | Latest (mid-2025); very efficient active/total ratio |
| Qwen (Alibaba) | |||||
| Qwen2.5-0.5B | 0.5B | 0.5B | Dense | ~1 GB | |
| Qwen2.5-1.5B | 1.5B | 1.5B | Dense | ~3 GB | |
| Qwen2.5-3B | 3B | 3B | Dense | ~6 GB | |
| Qwen2.5-7B | 7.6B | 7.6B | Dense | ~15 GB | |
| Qwen2.5-14B | 14.7B | 14.7B | Dense | ~29 GB | |
| Qwen2.5-32B | 32.5B | 32.5B | Dense | ~65 GB | |
| Qwen2.5-72B | 72.7B | 72.7B | Dense | ~145 GB | |
| Qwen2.5-Coder (all) | 0.5B-32B | Same | Dense | ~1-65 GB | Same sizes as base Qwen2.5; code-specialized |
| QwQ-32B | 32.5B | 32.5B | Dense | ~65 GB | Reasoning model |
| Qwen3-0.6B | 0.6B | 0.6B | Dense | ~1.2 GB | |
| Qwen3-1.7B | 1.7B | 1.7B | Dense | ~3.4 GB | |
| Qwen3-4B | 4B | 4B | Dense | ~8 GB | |
| Qwen3-8B | 8B | 8B | Dense | ~16 GB | |
| Qwen3-14B | 14B | 14B | Dense | ~28 GB | |
| Qwen3-32B | 32B | 32B | Dense | ~65 GB | |
| Qwen3-30B-A3B | 30B | 3B | MoE | ~60 GB | Small MoE |
| Qwen3-235B-A22B | 235B | 22B | MoE | ~470 GB | 118 safetensor shards; flagship Qwen3 |
| Qwen3.5-0.8B | 0.8B | 0.8B | Dense | ~1.6 GB | Latest generation (2025) |
| Qwen3.5-2B | 2B | 2B | Dense | ~4 GB | |
| Qwen3.5-4B | 4B | 4B | Dense | ~8 GB | |
| Qwen3.5-9B | 9B | 9B | Dense | ~18 GB | |
| Qwen3.5-27B | 27B | 27B | Dense | ~54 GB | |
| Qwen3.5-35B-A3B | 35B | 3B | MoE | ~70 GB | |
| Qwen3.5-122B-A10B | 122B | 10B | MoE | ~244 GB | |
| Qwen3.5-397B-A17B | 397B | 17B | MoE | ~807 GB | Largest Qwen; flagship |
| DeepSeek | |||||
| DeepSeek-V3 | 685B (671B+14B MTP) | 37B | MoE (257 experts) | ~685 GB | Main model 671B + MTP module 14B |
| DeepSeek-V3.1 | 671B | 37B | MoE | ~685 GB | Added thinking/non-thinking modes (Aug 2025) |
| DeepSeek-V3.2 | 671B | 37B | MoE | ~685 GB | Released Jan 2026; same architecture |
| DeepSeek-R1 | 671B | 37B | MoE | ~685 GB | Reasoning model; same param count as V3 |
| DeepSeek-R1-Distill-Qwen-1.5B | 1.5B | 1.5B | Dense | ~3 GB | Distilled from R1 |
| DeepSeek-R1-Distill-Qwen-7B | 7B | 7B | Dense | ~14 GB | |
| DeepSeek-R1-Distill-Llama-8B | 8B | 8B | Dense | ~16 GB | |
| DeepSeek-R1-Distill-Qwen-14B | 14B | 14B | Dense | ~28 GB | |
| DeepSeek-R1-Distill-Qwen-32B | 32B | 32B | Dense | ~65 GB | |
| DeepSeek-R1-Distill-Llama-70B | 70B | 70B | Dense | ~140 GB | |
| Meta Llama | |||||
| Llama 3.1-8B | 8B | 8B | Dense | ~16 GB | |
| Llama 3.1-70B | 70B | 70B | Dense | ~140 GB | |
| Llama 3.1-405B | 405B | 405B | Dense | ~750 GB | 191 safetensor shards; largest dense open model |
| Llama 3.3-70B | 70B | 70B | Dense | ~140 GB | Text-only; matches 405B quality |
| Llama 4 Scout (17B-16E) | 109B | 17B | MoE (16 experts) | ~216 GB | 10M context window; multimodal |
| Llama 4 Maverick (17B-128E) | 400B | 17B | MoE (128 experts) | ~800 GB | 1M context window; multimodal |
| Mistral | |||||
| Mixtral 8x7B | 46.7B | 12.9B | MoE (8 experts) | ~93 GB | 19 safetensor shards |
| Mixtral 8x22B | 141B | 39B | MoE (8 experts) | ~282 GB | |
| Mistral Large 2 (123B) | 123B | 123B | Dense | ~246 GB | Released Jul 2024 |
| Mistral Large 3 (675B) | 675B | 41B | MoE | ~675 GB | Released Dec 2025; multimodal |
| Other Notable Models | |||||
| Grok-1 (xAI) | 314B | ~86B | MoE (8 experts) | ~300 GB | Open-sourced Mar 2024 |
| Grok-2.5 (xAI) | ~300B+ | Unknown | Unknown | ~500 GB | 42 files total |
| Grok-3 (xAI) | Unknown | Unknown | Unknown | Not yet released | Promised open-source within 6 months of Aug 2025 |
Key Observations
The Biggest Downloads (Top 10 by Storage)
- Qwen3.5-397B-A17B -- ~807 GB
- Llama 4 Maverick -- ~800 GB
- Llama 3.1-405B -- ~750 GB (largest dense model)
- DeepSeek V3/V3.1/V3.2/R1 -- ~685 GB each
- Mistral Large 3 -- ~675 GB
- Grok-2.5 -- ~500 GB
- MiniMax-Text-01 -- ~495 GB
- Qwen3-235B-A22B -- ~470 GB
- MiniMax-M2.5 -- ~457 GB
- Grok-1 -- ~300 GB
Dense vs. MoE Storage Implications
- Dense models: storage = inference memory. All weights are used for every token.
- MoE models: storage >> inference memory. You download all expert weights but only a fraction activate. Example: DeepSeek-R1 is 685 GB on disk but only needs ~37B params (74 GB) active per token, though you still need all weights loaded.
Practical Thresholds
- Single consumer GPU (24 GB): Can run models up to ~13B dense (BF16) or ~40B with 4-bit quantization
- Dual GPU (48 GB): Up to ~24B dense or ~70B quantized
- 8x A100 80GB (640 GB): Can fit DeepSeek-R1, Mistral Large 3, most MoE models in BF16
- Multi-node required: Llama 3.1-405B (dense), Qwen3.5-397B, Llama 4 Maverick at full precision
Models You Might Be Missing
- Grok-2.5 -- 500 GB, recently open-sourced by xAI
- MiniMax-M2.5 -- 457 GB, very recent (mid-2025), strong coding/agent performance
- Qwen3.5-397B -- 807 GB, newest and largest Qwen, released 2025
- DeepSeek-V3.2 -- Same architecture as V3 but significantly improved; released Jan 2026
- Devstral-2-123B -- Mistral's code-specialized 123B model (Dec 2025)
- Llama 4 Behemoth -- Previewed by Meta with 2T total parameters, not yet released
Quantization Size Estimates
For quick reference, approximate file sizes for common quantization formats (relative to BF16):
| Quantization | Relative Size | Example: 70B model |
|---|---|---|
| BF16/FP16 | 100% (baseline) | 140 GB |
| FP8 | ~50% | 70 GB |
| INT8 / Q8_0 | ~50% | 70 GB |
| Q6_K | ~37% | 52 GB |
| Q5_K_M | ~32% | 45 GB |
| Q4_K_M | ~27% | 38 GB |
| Q3_K_M | ~20% | 28 GB |
| Q2_K | ~14% | 20 GB |
| GPTQ-Int4 / AWQ | ~25% | 35 GB |
Sources
- HuggingFace THUDM GLM-4 Collection
- THUDM/GLM-4-32B-0414
- MiniMaxAI/MiniMax-Text-01
- MiniMaxAI/MiniMax-M2.5 -- 457 GB BF16
- MiniMax-M2.5 blog (HuggingFace)
- Qwen3 GitHub
- Qwen3-235B-A22B-Instruct-2507
- Qwen3.5-397B-A17B -- 807 GB BF16
- Qwen/QwQ-32B
- DeepSeek-V3 HuggingFace -- 685 GB total
- DeepSeek-R1 HuggingFace
- DeepSeek-V3.2 HuggingFace
- Complete Guide to DeepSeek Models (BentoML)
- Llama 3.1 HuggingFace Blog
- Meta Llama 3.1-405B-Instruct
- Llama 4 HuggingFace Blog
- Meta Llama-4-Maverick-17B-128E-Instruct
- Meta Llama-4-Scout-17B-16E
- Mixtral 8x7B HuggingFace
- Mistral-Large-3-675B-Instruct-2512
- Mistral-Large-Instruct-2407 (123B)
- Grok-1 GitHub
- 15 Best Open Source LLMs (AceCloud)
- Qwen3.5 small models (Artificial Analysis)
Methodology Notes
- BF16 sizes are approximate: (total_params x 2 bytes) + overhead for embeddings/tokenizer. Actual download may vary by ~5%.
- MoE models store all expert weights. The "active params" column shows per-token activation.
- Sizes verified against HuggingFace model cards where available. Where not directly listed, calculated from parameter count x 2 bytes.
- "413 safetensor shards" for MiniMax-Text-01 and "118 shards" for Qwen3-235B confirmed from HuggingFace file listings.
- Some models (GLM-4.5, Grok-3) have limited public info on exact sizes.