Ollama
Browse articles on Ollama — tutorials, guides, and in-depth comparisons.
Ollama is the fastest way to run large language models locally — one command to pull a model, one command to run it. No Python environment, no API keys, no cloud dependency.
What You Can Do with Ollama
- Run 100+ open-source LLMs — Llama 3.3, Mistral, DeepSeek R1, Qwen 2.5, Gemini, and more
- OpenAI-compatible REST API — drop-in replacement for
api.openai.comin any app - GPU acceleration — NVIDIA CUDA, AMD ROCm, and Apple Metal (M1/M2/M3) out of the box
- Modelfiles — customize system prompts, temperature, and context length per model
- Multimodal — vision models like LLaVA and BakLLaVA for image + text tasks
Quick Start
# Install
curl -fsSL https://ollama.com/install.sh | sh
# Pull and run Llama 3.3 (4GB RAM needed for 8B, 35GB for 70B Q4)
ollama pull llama3.3
ollama run llama3.3
# OpenAI-compatible API (port 11434)
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"llama3.3","messages":[{"role":"user","content":"Hello"}]}'
Learning Path
- Install Ollama and run your first model — setup on Mac, Linux, Windows
- Choose the right quantization — Q4_K_M for quality, Q3_K_S for low VRAM
- Create a Modelfile — custom system prompts, parameters, persistent config
- Connect to your app — Python
requests, LangChain, LlamaIndex, or direct REST - Scale up — GPU layer offloading, concurrent requests, load balancing
Model Selection Guide
| Model | Size | Best for | VRAM needed |
|---|---|---|---|
| Llama 3.3 8B | 4.7GB | General use, fast | 6GB |
| Llama 3.3 70B Q4 | 35GB | High quality | 16GB + RAM |
| DeepSeek R1 7B | 4.7GB | Reasoning tasks | 6GB |
| Qwen 2.5-Coder 7B | 4.7GB | Code generation | 6GB |
| nomic-embed-text | 274MB | Embeddings / RAG | CPU OK |
Showing 391–420 of 490 articles · Page 14 of 17
- Step-by-Step: Building Character AI with Ollama Modelfile Configuration
- Step-by-Step Multi-GPU Ollama Setup for Large Model Inference
- QWQ 32B Logic and Analysis: Advanced Problem-Solving Tutorial
- Qwen3 MoE Architecture: Complete Guide to Mixture of Experts Deployment
- Qwen3 Code Generation: Building AI Development Assistant with Ollama
- Ollama v0.9.2 Installation Guide: Windows, macOS, and Linux Setup 2025
- Ollama Temperature and Parameter Tuning: Complete Response Optimization Guide
- Ollama REST API Tutorial: Building AI Applications with HTTP Requests
- Ollama OpenAI Compatibility: Complete Drop-in Replacement Setup Guide 2025
- Ollama Offline Installation: Running AI Models Without Internet Connection
- Ollama Modelfile Tutorial: Creating Custom AI Models from Scratch
- Ollama Model Versioning: Managing Custom Model Updates Like a Pro
- Ollama GPU Memory Allocation: Fixing VRAM Insufficient Errors and Performance Issues
- Ollama Embedding Models Complete Guide - Text Similarity and Vector Search Setup
- Ollama Docker Setup: Complete Containerized AI Environment Guide
- Ollama ARM64 Installation: Apple Silicon M1/M2 Optimization Guide
- Ollama API Authentication: Security Best Practices and Setup
- Ollama AMD RX 7900 XTX Setup: Complete ROCm Installation and Configuration Guide
- Ollama AMD ROCm Setup: GPU Acceleration on AMD Graphics Cards
- NVIDIA GeForce RTX 4090 Ollama Setup: Maximum Performance Configuration Guide
- JavaScript Ollama Library: Build Powerful Node.js AI Applications in Minutes
- Intel Arc GPU Ollama Support: OpenVINO Integration Tutorial 2025
- How to Update Ollama Models: Version Management and Migration Tutorial
- How to Switch Between CPU and GPU Inference in Ollama: Complete Performance Guide
- How to Solve QWQ Model Loading Failed Error: Common Fixes 2025
- How to Solve Ollama 'Port 11434 Already in Use' Error: Complete Network Configuration Guide
- How to Share Custom Ollama Models: Distribution and Packaging Guide
- How to Optimize Qwen3 128K Context Length: Complete Memory Management Guide
- How to Monitor GPU Usage in Ollama: Performance Tracking Tools & Methods
- How to Import GGUF Models into Ollama: Complete Conversion Guide