Frontier Open-Source LLM Model Weights: Storage Sizes

@jacobreal document public ai models research storage Updated 2026-03-16

Frontier Open-Source LLM Model Weights: Storage Sizes

Date: 2026-03-16
Purpose: Comprehensive reference for download/storage sizes of major open-source model families

Summary

The largest open-source models now exceed 800 GB for full-precision weights. MoE (Mixture-of-Experts) architectures dominate the frontier, with total parameter counts far exceeding active parameters per token. BF16 (bfloat16, 2 bytes per parameter) is the standard full-precision format on HuggingFace.

Quick formula: For dense models, BF16 size in GB ~= parameters in billions x 2. For MoE models, all expert weights are stored even though only a subset activates per token.

Master Table

Model	Params (Total)	Active Params	Architecture	BF16 Size (GB)	Notes
GLM-4 Family (THUDM/Zhipu)
GLM-4-9B	9.4B	9.4B (dense)	Dense	~18 GB	Base + Chat variants
GLM-4-32B-0414	32B	32B (dense)	Dense	~65 GB	Released Apr 2025
GLM-4.5 (zai-org)	Unknown	Unknown	Unknown	Unknown	Community upload; limited info
MiniMax
MiniMax-Text-01	456B	45.9B	MoE (32 experts, top-2)	~495 GB	413 safetensor shards; Lightning Attention hybrid
MiniMax-VL-01	456B	45.9B	MoE + Vision	~500 GB	Vision-language variant
MiniMax-M2.5	230B	10B	MoE	~457 GB	Latest (mid-2025); very efficient active/total ratio
Qwen (Alibaba)
Qwen2.5-0.5B	0.5B	0.5B	Dense	~1 GB
Qwen2.5-1.5B	1.5B	1.5B	Dense	~3 GB
Qwen2.5-3B	3B	3B	Dense	~6 GB
Qwen2.5-7B	7.6B	7.6B	Dense	~15 GB
Qwen2.5-14B	14.7B	14.7B	Dense	~29 GB
Qwen2.5-32B	32.5B	32.5B	Dense	~65 GB
Qwen2.5-72B	72.7B	72.7B	Dense	~145 GB
Qwen2.5-Coder (all)	0.5B-32B	Same	Dense	~1-65 GB	Same sizes as base Qwen2.5; code-specialized
QwQ-32B	32.5B	32.5B	Dense	~65 GB	Reasoning model
Qwen3-0.6B	0.6B	0.6B	Dense	~1.2 GB
Qwen3-1.7B	1.7B	1.7B	Dense	~3.4 GB
Qwen3-4B	4B	4B	Dense	~8 GB
Qwen3-8B	8B	8B	Dense	~16 GB
Qwen3-14B	14B	14B	Dense	~28 GB
Qwen3-32B	32B	32B	Dense	~65 GB
Qwen3-30B-A3B	30B	3B	MoE	~60 GB	Small MoE
Qwen3-235B-A22B	235B	22B	MoE	~470 GB	118 safetensor shards; flagship Qwen3
Qwen3.5-0.8B	0.8B	0.8B	Dense	~1.6 GB	Latest generation (2025)
Qwen3.5-2B	2B	2B	Dense	~4 GB
Qwen3.5-4B	4B	4B	Dense	~8 GB
Qwen3.5-9B	9B	9B	Dense	~18 GB
Qwen3.5-27B	27B	27B	Dense	~54 GB
Qwen3.5-35B-A3B	35B	3B	MoE	~70 GB
Qwen3.5-122B-A10B	122B	10B	MoE	~244 GB
Qwen3.5-397B-A17B	397B	17B	MoE	~807 GB	Largest Qwen; flagship
DeepSeek
DeepSeek-V3	685B (671B+14B MTP)	37B	MoE (257 experts)	~685 GB	Main model 671B + MTP module 14B
DeepSeek-V3.1	671B	37B	MoE	~685 GB	Added thinking/non-thinking modes (Aug 2025)
DeepSeek-V3.2	671B	37B	MoE	~685 GB	Released Jan 2026; same architecture
DeepSeek-R1	671B	37B	MoE	~685 GB	Reasoning model; same param count as V3
DeepSeek-R1-Distill-Qwen-1.5B	1.5B	1.5B	Dense	~3 GB	Distilled from R1
DeepSeek-R1-Distill-Qwen-7B	7B	7B	Dense	~14 GB
DeepSeek-R1-Distill-Llama-8B	8B	8B	Dense	~16 GB
DeepSeek-R1-Distill-Qwen-14B	14B	14B	Dense	~28 GB
DeepSeek-R1-Distill-Qwen-32B	32B	32B	Dense	~65 GB
DeepSeek-R1-Distill-Llama-70B	70B	70B	Dense	~140 GB
Meta Llama
Llama 3.1-8B	8B	8B	Dense	~16 GB
Llama 3.1-70B	70B	70B	Dense	~140 GB
Llama 3.1-405B	405B	405B	Dense	~750 GB	191 safetensor shards; largest dense open model
Llama 3.3-70B	70B	70B	Dense	~140 GB	Text-only; matches 405B quality
Llama 4 Scout (17B-16E)	109B	17B	MoE (16 experts)	~216 GB	10M context window; multimodal
Llama 4 Maverick (17B-128E)	400B	17B	MoE (128 experts)	~800 GB	1M context window; multimodal
Mistral
Mixtral 8x7B	46.7B	12.9B	MoE (8 experts)	~93 GB	19 safetensor shards
Mixtral 8x22B	141B	39B	MoE (8 experts)	~282 GB
Mistral Large 2 (123B)	123B	123B	Dense	~246 GB	Released Jul 2024
Mistral Large 3 (675B)	675B	41B	MoE	~675 GB	Released Dec 2025; multimodal
Other Notable Models
Grok-1 (xAI)	314B	~86B	MoE (8 experts)	~300 GB	Open-sourced Mar 2024
Grok-2.5 (xAI)	~300B+	Unknown	Unknown	~500 GB	42 files total
Grok-3 (xAI)	Unknown	Unknown	Unknown	Not yet released	Promised open-source within 6 months of Aug 2025

Key Observations

The Biggest Downloads (Top 10 by Storage)

Qwen3.5-397B-A17B -- ~807 GB
Llama 4 Maverick -- ~800 GB
Llama 3.1-405B -- ~750 GB (largest dense model)
DeepSeek V3/V3.1/V3.2/R1 -- ~685 GB each
Mistral Large 3 -- ~675 GB
Grok-2.5 -- ~500 GB
MiniMax-Text-01 -- ~495 GB
Qwen3-235B-A22B -- ~470 GB
MiniMax-M2.5 -- ~457 GB
Grok-1 -- ~300 GB

Dense vs. MoE Storage Implications

Dense models: storage = inference memory. All weights are used for every token.
MoE models: storage >> inference memory. You download all expert weights but only a fraction activate. Example: DeepSeek-R1 is 685 GB on disk but only needs ~37B params (74 GB) active per token, though you still need all weights loaded.

Practical Thresholds

Single consumer GPU (24 GB): Can run models up to ~13B dense (BF16) or ~40B with 4-bit quantization
Dual GPU (48 GB): Up to ~24B dense or ~70B quantized
8x A100 80GB (640 GB): Can fit DeepSeek-R1, Mistral Large 3, most MoE models in BF16
Multi-node required: Llama 3.1-405B (dense), Qwen3.5-397B, Llama 4 Maverick at full precision

Models You Might Be Missing

Grok-2.5 -- 500 GB, recently open-sourced by xAI
MiniMax-M2.5 -- 457 GB, very recent (mid-2025), strong coding/agent performance
Qwen3.5-397B -- 807 GB, newest and largest Qwen, released 2025
DeepSeek-V3.2 -- Same architecture as V3 but significantly improved; released Jan 2026
Devstral-2-123B -- Mistral's code-specialized 123B model (Dec 2025)
Llama 4 Behemoth -- Previewed by Meta with 2T total parameters, not yet released

Quantization Size Estimates

For quick reference, approximate file sizes for common quantization formats (relative to BF16):

Quantization	Relative Size	Example: 70B model
BF16/FP16	100% (baseline)	140 GB
FP8	~50%	70 GB
INT8 / Q8_0	~50%	70 GB
Q6_K	~37%	52 GB
Q5_K_M	~32%	45 GB
Q4_K_M	~27%	38 GB
Q3_K_M	~20%	28 GB
Q2_K	~14%	20 GB
GPTQ-Int4 / AWQ	~25%	35 GB

Sources

Methodology Notes

BF16 sizes are approximate: (total_params x 2 bytes) + overhead for embeddings/tokenizer. Actual download may vary by ~5%.
MoE models store all expert weights. The "active params" column shows per-token activation.
Sizes verified against HuggingFace model cards where available. Where not directly listed, calculated from parameter count x 2 bytes.
"413 safetensor shards" for MiniMax-Text-01 and "118 shards" for Qwen3-235B confirmed from HuggingFace file listings.
Some models (GLM-4.5, Grok-3) have limited public info on exact sizes.

# Frontier Open-Source LLM Model Weights: Storage Sizes

**Date:** 2026-03-16
**Purpose:** Comprehensive reference for download/storage sizes of major open-source model families

## Summary

**Quick formula:** For dense models, BF16 size in GB ~= parameters in billions x 2. For MoE models, all expert weights are stored even though only a subset activates per token.

---

## Master Table

| Model | Params (Total) | Active Params | Architecture | BF16 Size (GB) | Notes |
|---|---|---|---|---|---|
| **GLM-4 Family (THUDM/Zhipu)** | | | | | |
| GLM-4-9B | 9.4B | 9.4B (dense) | Dense | ~18 GB | Base + Chat variants |
| GLM-4-32B-0414 | 32B | 32B (dense) | Dense | ~65 GB | Released Apr 2025 |
| GLM-4.5 (zai-org) | Unknown | Unknown | Unknown | Unknown | Community upload; limited info |
| **MiniMax** | | | | | |
| MiniMax-Text-01 | 456B | 45.9B | MoE (32 experts, top-2) | ~495 GB | 413 safetensor shards; Lightning Attention hybrid |
| MiniMax-VL-01 | 456B | 45.9B | MoE + Vision | ~500 GB | Vision-language variant |
| MiniMax-M2.5 | 230B | 10B | MoE | ~457 GB | Latest (mid-2025); very efficient active/total ratio |
| **Qwen (Alibaba)** | | | | | |
| Qwen2.5-0.5B | 0.5B | 0.5B | Dense | ~1 GB | |
| Qwen2.5-1.5B | 1.5B | 1.5B | Dense | ~3 GB | |
| Qwen2.5-3B | 3B | 3B | Dense | ~6 GB | |
| Qwen2.5-7B | 7.6B | 7.6B | Dense | ~15 GB | |
| Qwen2.5-14B | 14.7B | 14.7B | Dense | ~29 GB | |
| Qwen2.5-32B | 32.5B | 32.5B | Dense | ~65 GB | |
| Qwen2.5-72B | 72.7B | 72.7B | Dense | ~145 GB | |
| Qwen2.5-Coder (all) | 0.5B-32B | Same | Dense | ~1-65 GB | Same sizes as base Qwen2.5; code-specialized |
| QwQ-32B | 32.5B | 32.5B | Dense | ~65 GB | Reasoning model |
| Qwen3-0.6B | 0.6B | 0.6B | Dense | ~1.2 GB | |
| Qwen3-1.7B | 1.7B | 1.7B | Dense | ~3.4 GB | |
| Qwen3-4B | 4B | 4B | Dense | ~8 GB | |
| Qwen3-8B | 8B | 8B | Dense | ~16 GB | |
| Qwen3-14B | 14B | 14B | Dense | ~28 GB | |
| Qwen3-32B | 32B | 32B | Dense | ~65 GB | |
| Qwen3-30B-A3B | 30B | 3B | MoE | ~60 GB | Small MoE |
| Qwen3-235B-A22B | 235B | 22B | MoE | ~470 GB | 118 safetensor shards; flagship Qwen3 |
| Qwen3.5-0.8B | 0.8B | 0.8B | Dense | ~1.6 GB | Latest generation (2025) |
| Qwen3.5-2B | 2B | 2B | Dense | ~4 GB | |
| Qwen3.5-4B | 4B | 4B | Dense | ~8 GB | |
| Qwen3.5-9B | 9B | 9B | Dense | ~18 GB | |
| Qwen3.5-27B | 27B | 27B | Dense | ~54 GB | |
| Qwen3.5-35B-A3B | 35B | 3B | MoE | ~70 GB | |
| Qwen3.5-122B-A10B | 122B | 10B | MoE | ~244 GB | |
| Qwen3.5-397B-A17B | 397B | 17B | MoE | **~807 GB** | Largest Qwen; flagship |
| **DeepSeek** | | | | | |
| DeepSeek-V3 | 685B (671B+14B MTP) | 37B | MoE (257 experts) | **~685 GB** | Main model 671B + MTP module 14B |
| DeepSeek-V3.1 | 671B | 37B | MoE | ~685 GB | Added thinking/non-thinking modes (Aug 2025) |
| DeepSeek-V3.2 | 671B | 37B | MoE | ~685 GB | Released Jan 2026; same architecture |
| DeepSeek-R1 | 671B | 37B | MoE | **~685 GB** | Reasoning model; same param count as V3 |
| DeepSeek-R1-Distill-Qwen-1.5B | 1.5B | 1.5B | Dense | ~3 GB | Distilled from R1 |
| DeepSeek-R1-Distill-Qwen-7B | 7B | 7B | Dense | ~14 GB | |
| DeepSeek-R1-Distill-Llama-8B | 8B | 8B | Dense | ~16 GB | |
| DeepSeek-R1-Distill-Qwen-14B | 14B | 14B | Dense | ~28 GB | |
| DeepSeek-R1-Distill-Qwen-32B | 32B | 32B | Dense | ~65 GB | |
| DeepSeek-R1-Distill-Llama-70B | 70B | 70B | Dense | ~140 GB | |
| **Meta Llama** | | | | | |
| Llama 3.1-8B | 8B | 8B | Dense | ~16 GB | |
| Llama 3.1-70B | 70B | 70B | Dense | ~140 GB | |
| Llama 3.1-405B | 405B | 405B | Dense | **~750 GB** | 191 safetensor shards; largest dense open model |
| Llama 3.3-70B | 70B | 70B | Dense | ~140 GB | Text-only; matches 405B quality |
| Llama 4 Scout (17B-16E) | 109B | 17B | MoE (16 experts) | ~216 GB | 10M context window; multimodal |
| Llama 4 Maverick (17B-128E) | 400B | 17B | MoE (128 experts) | **~800 GB** | 1M context window; multimodal |
| **Mistral** | | | | | |
| Mixtral 8x7B | 46.7B | 12.9B | MoE (8 experts) | ~93 GB | 19 safetensor shards |
| Mixtral 8x22B | 141B | 39B | MoE (8 experts) | ~282 GB | |
| Mistral Large 2 (123B) | 123B | 123B | Dense | ~246 GB | Released Jul 2024 |
| Mistral Large 3 (675B) | 675B | 41B | MoE | **~675 GB** | Released Dec 2025; multimodal |
| **Other Notable Models** | | | | | |
| Grok-1 (xAI) | 314B | ~86B | MoE (8 experts) | ~300 GB | Open-sourced Mar 2024 |
| Grok-2.5 (xAI) | ~300B+ | Unknown | Unknown | ~500 GB | 42 files total |
| Grok-3 (xAI) | Unknown | Unknown | Unknown | Not yet released | Promised open-source within 6 months of Aug 2025 |

---

## Key Observations

### The Biggest Downloads (Top 10 by Storage)

1. **Qwen3.5-397B-A17B** -- ~807 GB
2. **Llama 4 Maverick** -- ~800 GB
3. **Llama 3.1-405B** -- ~750 GB (largest *dense* model)
4. **DeepSeek V3/V3.1/V3.2/R1** -- ~685 GB each
5. **Mistral Large 3** -- ~675 GB
6. **Grok-2.5** -- ~500 GB
7. **MiniMax-Text-01** -- ~495 GB
8. **Qwen3-235B-A22B** -- ~470 GB
9. **MiniMax-M2.5** -- ~457 GB
10. **Grok-1** -- ~300 GB

### Dense vs. MoE Storage Implications

- Dense models: storage = inference memory. All weights are used for every token.
- MoE models: storage >> inference memory. You download all expert weights but only a fraction activate. Example: DeepSeek-R1 is 685 GB on disk but only needs ~37B params (74 GB) active per token, though you still need all weights loaded.

### Practical Thresholds

- **Single consumer GPU (24 GB):** Can run models up to ~13B dense (BF16) or ~40B with 4-bit quantization
- **Dual GPU (48 GB):** Up to ~24B dense or ~70B quantized
- **8x A100 80GB (640 GB):** Can fit DeepSeek-R1, Mistral Large 3, most MoE models in BF16
- **Multi-node required:** Llama 3.1-405B (dense), Qwen3.5-397B, Llama 4 Maverick at full precision

### Models You Might Be Missing

- **Grok-2.5** -- 500 GB, recently open-sourced by xAI
- **MiniMax-M2.5** -- 457 GB, very recent (mid-2025), strong coding/agent performance
- **Qwen3.5-397B** -- 807 GB, newest and largest Qwen, released 2025
- **DeepSeek-V3.2** -- Same architecture as V3 but significantly improved; released Jan 2026
- **Devstral-2-123B** -- Mistral's code-specialized 123B model (Dec 2025)
- **Llama 4 Behemoth** -- Previewed by Meta with 2T total parameters, not yet released

---

## Quantization Size Estimates

For quick reference, approximate file sizes for common quantization formats (relative to BF16):

| Quantization | Relative Size | Example: 70B model |
|---|---|---|
| BF16/FP16 | 100% (baseline) | 140 GB |
| FP8 | ~50% | 70 GB |
| INT8 / Q8_0 | ~50% | 70 GB |
| Q6_K | ~37% | 52 GB |
| Q5_K_M | ~32% | 45 GB |
| Q4_K_M | ~27% | 38 GB |
| Q3_K_M | ~20% | 28 GB |
| Q2_K | ~14% | 20 GB |
| GPTQ-Int4 / AWQ | ~25% | 35 GB |

---

## Sources

- [HuggingFace THUDM GLM-4 Collection](https://huggingface.co/collections/THUDM/glm-4-665fcf188c414b03c2f7e3b7)
- [THUDM/GLM-4-32B-0414](https://huggingface.co/THUDM/GLM-4-32B-0414)
- [MiniMaxAI/MiniMax-Text-01](https://huggingface.co/MiniMaxAI/MiniMax-Text-01)
- [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) -- 457 GB BF16
- [MiniMax-M2.5 blog (HuggingFace)](https://huggingface.co/blog/mlabonne/minimax-m25)
- [Qwen3 GitHub](https://github.com/QwenLM/Qwen3)
- [Qwen3-235B-A22B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507)
- [Qwen3.5-397B-A17B](https://huggingface.co/Qwen/Qwen3.5-397B-A17B) -- 807 GB BF16
- [Qwen/QwQ-32B](https://huggingface.co/Qwen/QwQ-32B)
- [DeepSeek-V3 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V3) -- 685 GB total
- [DeepSeek-R1 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1)
- [DeepSeek-V3.2 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V3.2)
- [Complete Guide to DeepSeek Models (BentoML)](https://www.bentoml.com/blog/the-complete-guide-to-deepseek-models-from-v3-to-r1-and-beyond)
- [Llama 3.1 HuggingFace Blog](https://huggingface.co/blog/llama31)
- [Meta Llama 3.1-405B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct)
- [Llama 4 HuggingFace Blog](https://huggingface.co/blog/llama4-release)
- [Meta Llama-4-Maverick-17B-128E-Instruct](https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Instruct)
- [Meta Llama-4-Scout-17B-16E](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E)
- [Mixtral 8x7B HuggingFace](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)
- [Mistral-Large-3-675B-Instruct-2512](https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512)
- [Mistral-Large-Instruct-2407 (123B)](https://huggingface.co/mistralai/Mistral-Large-Instruct-2407)
- [Grok-1 GitHub](https://github.com/xai-org/grok-1)
- [15 Best Open Source LLMs (AceCloud)](https://acecloud.ai/blog/best-open-source-llms/)
- [Qwen3.5 small models (Artificial Analysis)](https://artificialanalysis.ai/articles/qwen3-5-small-models)

---

## Methodology Notes

- BF16 sizes are approximate: (total_params x 2 bytes) + overhead for embeddings/tokenizer. Actual download may vary by ~5%.
- MoE models store all expert weights. The "active params" column shows per-token activation.
- Sizes verified against HuggingFace model cards where available. Where not directly listed, calculated from parameter count x 2 bytes.
- "413 safetensor shards" for MiniMax-Text-01 and "118 shards" for Qwen3-235B confirmed from HuggingFace file listings.
- Some models (GLM-4.5, Grok-3) have limited public info on exact sizes.