GGUF (GPT-Generated Unified Format) is a file format for storing large language models. It's used by llama.cpp, Ollama, and other local LLM frameworks. GGUF files contain the model's weights (tensors), metadata, and configuration.

User Guide

Getting Started

Go to the home page
Drag and drop your .gguf file onto the drop zone, or click to browse
Wait a moment while GGUF Inspector reads the file metadata
Explore the four tabs: Overview, Tensors, Quantization, and Architecture

Understanding the Tabs

Overview

Shows a summary of your model:

Model Name: The name stored in the file metadata
Architecture: Model type (e.g., llama, mistral, gpt2)
File Size: Total size of the GGUF file
Tensor Count: Number of tensors (weight arrays)
Context Length: Maximum input sequence length
Parameters (est.): Approximate parameter count

Below the summary, you'll find a complete metadata table with all key-value pairs stored in the file.

Tensors

A detailed table of all tensors in the model:

Name: Tensor identifier (e.g., model.layers.0.attention.wq)
Shape: Dimensions of the tensor
Type: Quantization type (F32, Q4_K, etc.)
Size: Size in bytes
% of Total: Percentage of total model size

Use the filter dropdown and search box to find specific tensors.

Quantization

Visualizations and statistics about quantization:

Pie Chart: Distribution of tensor types (how many tensors use each quantization)
Bar Chart: Size contribution by type (which types take up the most space)
Table: Detailed breakdown with bits-per-weight

Architecture

A hierarchical tree view of tensor names. Click to expand/collapse sections and explore the model's structure.

What is GGUF?

GGUF (GPT-Generated Unified Format) is a file format for storing large language models. It's used by:

llama.cpp — C++ implementation for running LLMs
Ollama — Local LLM platform
Many other local AI tools

GGUF files contain:

Header: File version and tensor counts
Metadata: Model name, architecture, hyperparameters
Tensor Info: Name, shape, and type for each weight array
Tensor Data: The actual model weights (not read by GGUF Inspector)

Understanding Quantization

Quantization reduces model size by storing weights in lower precision. Common types:

F32: Full 32-bit floating point (largest, most accurate)
F16: Half precision (16 bits)
Q8: 8-bit quantization
Q6_K, Q5_K, Q4_K: 6/5/4-bit K-quants (good balance)
Q4_0, Q4_1: 4-bit legacy formats
Q3_K, Q2_K: 3/2-bit (smallest, lower quality)
IQ series: Advanced quantization methods

Lower bit counts = smaller files, faster inference, but potentially lower quality. Most users prefer Q4_K or Q5_K for a good balance.

FAQ

Is my file uploaded to a server?

No! GGUF Inspector is 100% client-side. All processing happens in your browser using the Web File API. Your files never leave your device.

Can I inspect large files (100GB+)?

Yes! GGUF Inspector only reads the header and metadata sections (first ~10-30 MB) using File.slice(). It never loads the full file into memory, so even massive models can be inspected instantly.

What browsers are supported?

Any modern browser with File API support: Chrome, Firefox, Safari, Edge. Mobile browsers work too!

Why doesn't my file load?

Make sure it's a valid .gguf file. Corrupted or incompatible files will show an error. If you believe it's a bug, please contact us.

Can I analyze multiple files at once?

Not yet, but this feature is planned! For now, analyze one file at a time and use the "Analyze Another File" button to switch.

Need Help?

If you have questions or encounter issues, reach out:

Email: nullkit.dev@outlook.com