Ollama
Browse articles on Ollama — tutorials, guides, and in-depth comparisons.
Ollama is the fastest way to run large language models locally — one command to pull a model, one command to run it. No Python environment, no API keys, no cloud dependency.
What You Can Do with Ollama
- Run 100+ open-source LLMs — Llama 3.3, Mistral, DeepSeek R1, Qwen 2.5, Gemini, and more
- OpenAI-compatible REST API — drop-in replacement for
api.openai.comin any app - GPU acceleration — NVIDIA CUDA, AMD ROCm, and Apple Metal (M1/M2/M3) out of the box
- Modelfiles — customize system prompts, temperature, and context length per model
- Multimodal — vision models like LLaVA and BakLLaVA for image + text tasks
Quick Start
# Install
curl -fsSL https://ollama.com/install.sh | sh
# Pull and run Llama 3.3 (4GB RAM needed for 8B, 35GB for 70B Q4)
ollama pull llama3.3
ollama run llama3.3
# OpenAI-compatible API (port 11434)
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"llama3.3","messages":[{"role":"user","content":"Hello"}]}'
Learning Path
- Install Ollama and run your first model — setup on Mac, Linux, Windows
- Choose the right quantization — Q4_K_M for quality, Q3_K_S for low VRAM
- Create a Modelfile — custom system prompts, parameters, persistent config
- Connect to your app — Python
requests, LangChain, LlamaIndex, or direct REST - Scale up — GPU layer offloading, concurrent requests, load balancing
Model Selection Guide
| Model | Size | Best for | VRAM needed |
|---|---|---|---|
| Llama 3.3 8B | 4.7GB | General use, fast | 6GB |
| Llama 3.3 70B Q4 | 35GB | High quality | 16GB + RAM |
| DeepSeek R1 7B | 4.7GB | Reasoning tasks | 6GB |
| Qwen 2.5-Coder 7B | 4.7GB | Code generation | 6GB |
| nomic-embed-text | 274MB | Embeddings / RAG | CPU OK |
Showing 421–450 of 490 articles · Page 15 of 17
- How to Implement Streaming Responses with Ollama API: Complete Developer Guide
- How to Fix Ollama API Connection Refused Error: Complete Network Troubleshooting Guide
- How to Fix CUDA Out of Memory Error in Ollama: GPU Optimization Guide
- How to Fix 'Ollama Command Not Found' Error: Complete PATH Configuration Guide
- How to Fix 'Embedding Generation Failed' Error in Ollama RAG: Complete Troubleshooting Guide
- How to Create Domain-Specific AI Models with Ollama Modelfile: Complete Guide 2025
- How to Build PDF Chat with Ollama: Vector Database Integration Guide
- Fix Ollama Service Won't Start on Ubuntu 24.04: Complete Troubleshooting Guide
- Fix Ollama GPU Detection Issues: Complete Driver Configuration Guide
- Custom System Prompts in Ollama: Advanced Model Personalization
- Curl Commands for Ollama: Complete Command-Line API Testing Tutorial
- Apple Metal Performance Shaders: M1/M2 Ollama Optimization Guide
- Advanced Modelfile Techniques: Multi-Model Ensembles and Routing
- Troubleshooting Phi-4 Installation on Windows 11: Common Errors Fixed
- Troubleshooting Gemma 3 Slow Inference: Performance Tuning Guide 2025
- Troubleshooting DeepSeek-R1 Installation Errors on Ubuntu 24.04 and Ollama: Complete Fix Guide
- Step-by-Step: Running Gemma 3 on Single GPU with Ollama Optimization
- Step-by-Step: Running DeepSeek-R1 Distilled Models on Consumer GPUs 2025
- Step-by-Step: Integrating Phi-4 with Visual Studio Code and Ollama
- Step-by-Step: Configuring Llama 3.3 Tool Calling Features in Ollama
- QWQ 32B Reasoning Model: Complete Ollama Installation and Setup Guide
- Phi-4 Reasoning Capabilities: Advanced Problem-Solving Setup Guide
- Phi-4 API Integration: Building Chatbots with Python and Ollama
- Microsoft Phi-4 14B Installation: Complete Ollama Setup Guide 2025
- Llama 4 Vision Image Analysis: Complete Implementation Tutorial with Code
- Llama 3.3 Quantization Guide: Reducing Model Size with GGUF and Ollama
- Llama 3.3 70B Installation Guide: Ollama Setup with GPU Acceleration 2025
- Llama 3.2 Vision 11B and 90B: Complete Ollama Setup Tutorial for Multimodal AI
- Llama 3.2 1B and 3B Models: Lightweight AI Setup for Resource-Constrained Systems
- How to Solve Phi-4 Insufficient VRAM Error: 7 Memory Optimization Techniques That Work