User Guide
Getting Started
- Go to the home page
- Drag and drop your .gguf file onto the drop zone, or click to browse
- Wait a moment while GGUF Inspector reads the file metadata
- Explore the four tabs: Overview, Tensors, Quantization, and Architecture
Understanding the Tabs
Overview
Shows a summary of your model:
- Model Name: The name stored in the file metadata
- Architecture: Model type (e.g., llama, mistral, gpt2)
- File Size: Total size of the GGUF file
- Tensor Count: Number of tensors (weight arrays)
- Context Length: Maximum input sequence length
- Parameters (est.): Approximate parameter count
Below the summary, you'll find a complete metadata table with all key-value pairs stored in the file.
Tensors
A detailed table of all tensors in the model:
- Name: Tensor identifier (e.g.,
model.layers.0.attention.wq) - Shape: Dimensions of the tensor
- Type: Quantization type (F32, Q4_K, etc.)
- Size: Size in bytes
- % of Total: Percentage of total model size
Use the filter dropdown and search box to find specific tensors.
Quantization
Visualizations and statistics about quantization:
- Pie Chart: Distribution of tensor types (how many tensors use each quantization)
- Bar Chart: Size contribution by type (which types take up the most space)
- Table: Detailed breakdown with bits-per-weight
Architecture
A hierarchical tree view of tensor names. Click to expand/collapse sections and explore the model's structure.
What is GGUF?
GGUF (GPT-Generated Unified Format) is a file format for storing large language models. It's used by:
- llama.cpp — C++ implementation for running LLMs
- Ollama — Local LLM platform
- Many other local AI tools
GGUF files contain:
- Header: File version and tensor counts
- Metadata: Model name, architecture, hyperparameters
- Tensor Info: Name, shape, and type for each weight array
- Tensor Data: The actual model weights (not read by GGUF Inspector)
Understanding Quantization
Quantization reduces model size by storing weights in lower precision. Common types:
- F32: Full 32-bit floating point (largest, most accurate)
- F16: Half precision (16 bits)
- Q8: 8-bit quantization
- Q6_K, Q5_K, Q4_K: 6/5/4-bit K-quants (good balance)
- Q4_0, Q4_1: 4-bit legacy formats
- Q3_K, Q2_K: 3/2-bit (smallest, lower quality)
- IQ series: Advanced quantization methods
Lower bit counts = smaller files, faster inference, but potentially lower quality. Most users prefer Q4_K or Q5_K for a good balance.
FAQ
Is my file uploaded to a server?
No! GGUF Inspector is 100% client-side. All processing happens in your browser using the Web File API. Your files never leave your device.
Can I inspect large files (100GB+)?
Yes! GGUF Inspector only reads the header and metadata sections (first ~10-30 MB) using File.slice(). It never loads the full file into memory, so even massive models can be inspected instantly.
What browsers are supported?
Any modern browser with File API support: Chrome, Firefox, Safari, Edge. Mobile browsers work too!
Why doesn't my file load?
Make sure it's a valid .gguf file. Corrupted or incompatible files will show an error. If you believe it's a bug, please contact us.
Can I analyze multiple files at once?
Not yet, but this feature is planned! For now, analyze one file at a time and use the "Analyze Another File" button to switch.
Need Help?
If you have questions or encounter issues, reach out:
Email: nullkit.dev@outlook.com