Frontier Open-Source LLM Model Weights: Storage Sizes

@jacobreal document public ai models research storage Updated 2026-03-16

Frontier Open-Source LLM Model Weights: Storage Sizes

Date: 2026-03-16
Purpose: Comprehensive reference for download/storage sizes of major open-source model families

Summary

The largest open-source models now exceed 800 GB for full-precision weights. MoE (Mixture-of-Experts) architectures dominate the frontier, with total parameter counts far exceeding active parameters per token. BF16 (bfloat16, 2 bytes per parameter) is the standard full-precision format on HuggingFace.

Quick formula: For dense models, BF16 size in GB ~= parameters in billions x 2. For MoE models, all expert weights are stored even though only a subset activates per token.


Master Table

Model Params (Total) Active Params Architecture BF16 Size (GB) Notes
GLM-4 Family (THUDM/Zhipu)
GLM-4-9B 9.4B 9.4B (dense) Dense ~18 GB Base + Chat variants
GLM-4-32B-0414 32B 32B (dense) Dense ~65 GB Released Apr 2025
GLM-4.5 (zai-org) Unknown Unknown Unknown Unknown Community upload; limited info
MiniMax
MiniMax-Text-01 456B 45.9B MoE (32 experts, top-2) ~495 GB 413 safetensor shards; Lightning Attention hybrid
MiniMax-VL-01 456B 45.9B MoE + Vision ~500 GB Vision-language variant
MiniMax-M2.5 230B 10B MoE ~457 GB Latest (mid-2025); very efficient active/total ratio
Qwen (Alibaba)
Qwen2.5-0.5B 0.5B 0.5B Dense ~1 GB
Qwen2.5-1.5B 1.5B 1.5B Dense ~3 GB
Qwen2.5-3B 3B 3B Dense ~6 GB
Qwen2.5-7B 7.6B 7.6B Dense ~15 GB
Qwen2.5-14B 14.7B 14.7B Dense ~29 GB
Qwen2.5-32B 32.5B 32.5B Dense ~65 GB
Qwen2.5-72B 72.7B 72.7B Dense ~145 GB
Qwen2.5-Coder (all) 0.5B-32B Same Dense ~1-65 GB Same sizes as base Qwen2.5; code-specialized
QwQ-32B 32.5B 32.5B Dense ~65 GB Reasoning model
Qwen3-0.6B 0.6B 0.6B Dense ~1.2 GB
Qwen3-1.7B 1.7B 1.7B Dense ~3.4 GB
Qwen3-4B 4B 4B Dense ~8 GB
Qwen3-8B 8B 8B Dense ~16 GB
Qwen3-14B 14B 14B Dense ~28 GB
Qwen3-32B 32B 32B Dense ~65 GB
Qwen3-30B-A3B 30B 3B MoE ~60 GB Small MoE
Qwen3-235B-A22B 235B 22B MoE ~470 GB 118 safetensor shards; flagship Qwen3
Qwen3.5-0.8B 0.8B 0.8B Dense ~1.6 GB Latest generation (2025)
Qwen3.5-2B 2B 2B Dense ~4 GB
Qwen3.5-4B 4B 4B Dense ~8 GB
Qwen3.5-9B 9B 9B Dense ~18 GB
Qwen3.5-27B 27B 27B Dense ~54 GB
Qwen3.5-35B-A3B 35B 3B MoE ~70 GB
Qwen3.5-122B-A10B 122B 10B MoE ~244 GB
Qwen3.5-397B-A17B 397B 17B MoE ~807 GB Largest Qwen; flagship
DeepSeek
DeepSeek-V3 685B (671B+14B MTP) 37B MoE (257 experts) ~685 GB Main model 671B + MTP module 14B
DeepSeek-V3.1 671B 37B MoE ~685 GB Added thinking/non-thinking modes (Aug 2025)
DeepSeek-V3.2 671B 37B MoE ~685 GB Released Jan 2026; same architecture
DeepSeek-R1 671B 37B MoE ~685 GB Reasoning model; same param count as V3
DeepSeek-R1-Distill-Qwen-1.5B 1.5B 1.5B Dense ~3 GB Distilled from R1
DeepSeek-R1-Distill-Qwen-7B 7B 7B Dense ~14 GB
DeepSeek-R1-Distill-Llama-8B 8B 8B Dense ~16 GB
DeepSeek-R1-Distill-Qwen-14B 14B 14B Dense ~28 GB
DeepSeek-R1-Distill-Qwen-32B 32B 32B Dense ~65 GB
DeepSeek-R1-Distill-Llama-70B 70B 70B Dense ~140 GB
Meta Llama
Llama 3.1-8B 8B 8B Dense ~16 GB
Llama 3.1-70B 70B 70B Dense ~140 GB
Llama 3.1-405B 405B 405B Dense ~750 GB 191 safetensor shards; largest dense open model
Llama 3.3-70B 70B 70B Dense ~140 GB Text-only; matches 405B quality
Llama 4 Scout (17B-16E) 109B 17B MoE (16 experts) ~216 GB 10M context window; multimodal
Llama 4 Maverick (17B-128E) 400B 17B MoE (128 experts) ~800 GB 1M context window; multimodal
Mistral
Mixtral 8x7B 46.7B 12.9B MoE (8 experts) ~93 GB 19 safetensor shards
Mixtral 8x22B 141B 39B MoE (8 experts) ~282 GB
Mistral Large 2 (123B) 123B 123B Dense ~246 GB Released Jul 2024
Mistral Large 3 (675B) 675B 41B MoE ~675 GB Released Dec 2025; multimodal
Other Notable Models
Grok-1 (xAI) 314B ~86B MoE (8 experts) ~300 GB Open-sourced Mar 2024
Grok-2.5 (xAI) ~300B+ Unknown Unknown ~500 GB 42 files total
Grok-3 (xAI) Unknown Unknown Unknown Not yet released Promised open-source within 6 months of Aug 2025

Key Observations

The Biggest Downloads (Top 10 by Storage)

  1. Qwen3.5-397B-A17B -- ~807 GB
  2. Llama 4 Maverick -- ~800 GB
  3. Llama 3.1-405B -- ~750 GB (largest dense model)
  4. DeepSeek V3/V3.1/V3.2/R1 -- ~685 GB each
  5. Mistral Large 3 -- ~675 GB
  6. Grok-2.5 -- ~500 GB
  7. MiniMax-Text-01 -- ~495 GB
  8. Qwen3-235B-A22B -- ~470 GB
  9. MiniMax-M2.5 -- ~457 GB
  10. Grok-1 -- ~300 GB

Dense vs. MoE Storage Implications

  • Dense models: storage = inference memory. All weights are used for every token.
  • MoE models: storage >> inference memory. You download all expert weights but only a fraction activate. Example: DeepSeek-R1 is 685 GB on disk but only needs ~37B params (74 GB) active per token, though you still need all weights loaded.

Practical Thresholds

  • Single consumer GPU (24 GB): Can run models up to ~13B dense (BF16) or ~40B with 4-bit quantization
  • Dual GPU (48 GB): Up to ~24B dense or ~70B quantized
  • 8x A100 80GB (640 GB): Can fit DeepSeek-R1, Mistral Large 3, most MoE models in BF16
  • Multi-node required: Llama 3.1-405B (dense), Qwen3.5-397B, Llama 4 Maverick at full precision

Models You Might Be Missing

  • Grok-2.5 -- 500 GB, recently open-sourced by xAI
  • MiniMax-M2.5 -- 457 GB, very recent (mid-2025), strong coding/agent performance
  • Qwen3.5-397B -- 807 GB, newest and largest Qwen, released 2025
  • DeepSeek-V3.2 -- Same architecture as V3 but significantly improved; released Jan 2026
  • Devstral-2-123B -- Mistral's code-specialized 123B model (Dec 2025)
  • Llama 4 Behemoth -- Previewed by Meta with 2T total parameters, not yet released

Quantization Size Estimates

For quick reference, approximate file sizes for common quantization formats (relative to BF16):

Quantization Relative Size Example: 70B model
BF16/FP16 100% (baseline) 140 GB
FP8 ~50% 70 GB
INT8 / Q8_0 ~50% 70 GB
Q6_K ~37% 52 GB
Q5_K_M ~32% 45 GB
Q4_K_M ~27% 38 GB
Q3_K_M ~20% 28 GB
Q2_K ~14% 20 GB
GPTQ-Int4 / AWQ ~25% 35 GB

Sources


Methodology Notes

  • BF16 sizes are approximate: (total_params x 2 bytes) + overhead for embeddings/tokenizer. Actual download may vary by ~5%.
  • MoE models store all expert weights. The "active params" column shows per-token activation.
  • Sizes verified against HuggingFace model cards where available. Where not directly listed, calculated from parameter count x 2 bytes.
  • "413 safetensor shards" for MiniMax-Text-01 and "118 shards" for Qwen3-235B confirmed from HuggingFace file listings.
  • Some models (GLM-4.5, Grok-3) have limited public info on exact sizes.