All Posts

Browse all articles and tutorials on MarkAI Code. Discover the latest insights on AI, machine learning, programming, and technology trends.

50 articles Page 2 of 164

Setup Open WebUI: Full-Featured Ollama Frontend Guide 2026

Install and configure Open WebUI as your Ollama frontend. Docker setup, model management, RAG, tools, and multi-user auth on Linux and macOS. Tested on Docker 27.

Mar 12, 2026 Ollama 8 min read

Split Large Models Across GPUs: LM Studio Multi-GPU Setup 2026

Configure LM Studio multi-GPU to split Llama 3.3 70B, Mixtral, and DeepSeek across 2–4 GPUs. Layer-splitting, VRAM balancing, and GPU offload settings explained.

Mar 12, 2026 Local LLM 8 min read

Use Together AI Fast Inference API for Open-Source LLMs 2026

Together AI fast inference API for open-source LLMs: run Llama 3.3, Mistral, Qwen, and DeepSeek at scale with Python or TypeScript. Free tier, USD pricing included.

Mar 12, 2026 LLM 6 min read

vLLM vs TGI: LLM Serving Framework Comparison 2026

vLLM vs TGI compared on throughput, latency, model support, Docker self-hosting, and USD pricing. Choose the right LLM inference server for production.

Mar 12, 2026 Local LLM 9 min read

Automate Desktop with Claude Computer Use API 2026

Use Claude Computer Use API to automate desktop tasks with Python and Docker. Control browsers, GUIs, and files with AI vision on Ubuntu. Starts at $3/MTok.

Mar 11, 2026 AI Coding 7 min read

Build a Claude Code Custom Agent with Tool Use 2026

Build a Claude Code custom agent with tool use in Python 3.12. Wire bash, file read/write, and web search tools into an autonomous agentic loop. Tested on macOS & Ubuntu.

Mar 11, 2026 AI Coding 8 min read

Build Claude 4.5 JSON Mode: Reliable Structured Output 2026

Claude 4.5 JSON mode structured output patterns using Python 3.12 and the Anthropic SDK. Extract validated data, avoid parse errors, build production pipelines.

Mar 11, 2026 LLM 7 min read

Build Claude Code PR Reviews with GitHub Actions 2026

Automate pull request reviews using Claude Code and GitHub Actions. Add AI code review to any repo in 20 minutes with claude-code-action. Tested on Node 22 + Ubuntu.

Mar 11, 2026 AI Coding 6 min read

Build Claude Sonnet 4.5 API: Function Calling and Streaming 2026

Claude Sonnet 4.5 API function calling and streaming guide for developers. Ship tool use, real-time output, and production patterns with Python 3.12 + Node 22.

Mar 11, 2026 LLM 11 min read

Build FastAPI and Django Apps Faster with Windsurf 2026

Use Windsurf Cascade to scaffold FastAPI routes, Django models, and async endpoints. Boost backend productivity with AI-assisted Python development. Python 3.12 + uv.

Mar 11, 2026 AI Coding 8 min read

Build MCP AWS Knowledge Bases: Enterprise RAG Integration 2026

Connect Claude to Amazon Bedrock Knowledge Bases via MCP. Query private S3 docs, enable reranking, and wire IAM permissions for enterprise RAG on AWS. Tested on us-east-1.

Mar 11, 2026 MCP 7 min read

Build MCP Notion Server: AI Access to Your Knowledge Base 2026

Connect Claude and Cursor to your Notion workspace using MCP. Query pages, databases, and docs with natural language. No custom code. Setup in 15 min with Node 22.

Mar 11, 2026 MCP 6 min read

Claude 4.5 vs GPT-4o: Coding Benchmark Comparison 2026

Claude 4.5 vs GPT-4o coding benchmarks compared on HumanEval, SWE-bench, agentic tasks, latency, and API pricing in USD. For developers choosing an LLM in 2026.

Mar 11, 2026 LLM 7 min read

Claude Code Multi-File Refactoring: Real-World Walkthrough 2026

Claude Code multi-file refactoring walkthrough: plan, execute, and verify large-scale codebase changes using agentic AI. Tested on Python 3.12 and Node 22.

Mar 11, 2026 AI Coding 7 min read

Claude Code Project Memory: .claude Files Explained 2026

Master .claude files in Claude Code for persistent project memory. Configure CLAUDE.md, slash commands, and settings to automate your dev workflow. Tested on Claude Code CLI.

Mar 11, 2026 AI Coding 5 min read

Compare Qwen 2.5-Max API Versions: Which Is Strongest in 2026

Qwen 2.5-Max vs Qwen3-Max vs Qwen3.5 compared by benchmark, pricing, and context window. Pick the right qwen-max API version for your stack.

Mar 11, 2026 LLM 8 min read

Configure Windsurf Rules for AI Agent Project Context 2026

Set up Windsurf Rules to give AI agents persistent project context, coding standards, and memory. Works with .windsurfrules and global rules. TypeScript & Python tested.

Mar 11, 2026 AI Coding 8 min read

Continued Pre-Training vs Fine-Tuning: Choose Right 2026

Continued pre-training vs fine-tuning compared for LLM customization. Learn which method fits your data, budget, and use case. Python 3.12 + Hugging Face.

Mar 11, 2026 Fine Tuning 8 min read

Convert Fine-Tuned Models to GGUF: llama.cpp Workflow 2026

GGUF quantization after fine-tuning with llama.cpp: convert, quantize to Q4_K_M or Q8_0, and run locally. Tested on Python 3.12, CUDA 12, Ubuntu 24.

Mar 11, 2026 Fine Tuning 6 min read

Deploy Claude Haiku 4.5 for High-Volume Production Workloads 2026

Claude Haiku 4.5 for high-volume production workloads: batch API setup, cost optimization, and throughput tuning in Python 3.12 + Docker. Starts at $0.80/MTok.

Mar 11, 2026 LLM 9 min read

Deploy Qwen2.5-VL Locally: Vision Language Model Setup 2026

Run Qwen2.5-VL 7B or 72B locally with Ollama or vLLM for image understanding, OCR, and visual reasoning. Tested on Python 3.12, CUDA 12, and Apple Silicon.

Mar 11, 2026 Local LLM 7 min read

Evaluate Fine-Tuned LLMs: MMLU, MT-Bench, and Custom Evals 2026

Run MMLU, MT-Bench, and custom eval suites on fine-tuned models using lm-evaluation-harness and FastChat. Tested on Python 3.12 + CUDA 12 + Hugging Face.

Mar 11, 2026 Fine Tuning 7 min read

Fine-Tune Llama 3.3 with Unsloth: 5x Faster Training 2026

Fine-tune Llama 3.3 70B with Unsloth for 5x faster training and 60% less VRAM. Step-by-step guide using QLoRA, Python 3.12, CUDA 12, and Google Colab or local RTX GPU.

Mar 11, 2026 Fine Tuning 7 min read

Fine-Tune LlamaIndex Embeddings for Domain Adaptation 2026

Fine-tune LlamaIndex embeddings on your own data to boost RAG retrieval accuracy. Covers synthetic dataset generation, training, and evaluation. Python 3.12 + CUDA 12.

Mar 11, 2026 RAG 7 min read

Fine-Tune LLMs for JSON Output: Structured Response Training 2026

Fine-tune LLMs to return reliable JSON output using Unsloth, Axolotl, and Pydantic schema enforcement. Tested on Python 3.12, CUDA 12, and 4-bit QLoRA.

Mar 11, 2026 Fine Tuning 8 min read

Fine-Tune LLMs on RunPod: GPU Cloud Setup Guide 2026

Fine-tune Llama 3, Mistral, or Qwen2.5 on RunPod GPU cloud using Axolotl and QLoRA. Step-by-step setup for A100/H100 pods, pod config, and cost control. Tested on Python 3.12.

Mar 11, 2026 Fine Tuning 9 min read

Fine-Tune LLMs with LISA: Layer-Wise Importance Sampling 2026

Fine-tune large language models using LISA layer-wise importance sampling. Cut GPU memory 60% vs LoRA with better convergence. Tested on Python 3.12 + CUDA 12.

Mar 11, 2026 Fine Tuning 8 min read

Fine-Tune Mistral 7B for SQL Generation: LoRA on 16GB VRAM 2026

Fine-tune Mistral 7B for SQL generation using QLoRA and Unsloth on 16GB VRAM. Covers dataset prep, training, evaluation, and deployment. Python 3.12 + CUDA 12.

Mar 11, 2026 Fine Tuning 9 min read

Fine-Tune Models with Synthetic Data: GPT-4o Dataset Generation 2026

Generate high-quality synthetic training datasets with GPT-4o, format them for fine-tuning, and train a smaller model. Python 3.12, OpenAI SDK, JSONL output.

Mar 11, 2026 Fine Tuning 9 min read

Format Fine-Tuning Datasets: ShareGPT vs Alpaca Compared 2026

ShareGPT vs Alpaca dataset formatting for LLM fine-tuning explained. Convert, validate, and pick the right format for Unsloth, Axolotl, and TRL. Python 3.12.

Mar 11, 2026 Fine Tuning 9 min read

Manage LM Studio Models: Download, Organize, Switch 2026

Master LM Studio model management: download GGUF models, organize your local library, and switch between models instantly. Tested on Windows 11 and macOS Sequoia.

Mar 11, 2026 Local LLM 7 min read

Master Claude Code Slash Commands: 15 Productivity Shortcuts 2026

Claude Code slash commands speed up your dev workflow with 15 shortcuts for memory, context, git, and custom commands. Tested on Claude Code CLI, Node 22, macOS & Ubuntu.

Mar 11, 2026 AI Coding 8 min read

ORPO Fine-Tuning: Better Alignment Without Preference Data 2026

Fine-tune LLMs with ORPO to align model behavior without a separate reward model or preference dataset. Tested on Python 3.12, TRL 0.8, and Llama 3 8B.

Mar 11, 2026 Fine Tuning 8 min read

Refactor with Windsurf Cascade Agent: Autonomous Codebase Guide 2026

Use Windsurf Cascade Agent to autonomously refactor large codebases. Multi-file edits, terminal commands, and safe rollback on TypeScript and Python projects.

Mar 11, 2026 AI Coding 8 min read

Run and Fine-Tune LLMs on Mac with MLX-LM 2026

Run and fine-tune LLMs on Apple Silicon using MLX-LM. Covers install, quantization, LoRA fine-tuning, and serving — tested on M2/M3 with 16GB RAM.

Mar 11, 2026 Local LLM 10 min read

Run Qwen 2.5 72B Locally: Ollama and LM Studio Setup 2026

Install Qwen 2.5 72B locally with Ollama or LM Studio. Covers GGUF quantization, VRAM requirements, GPU offloading, and inference config on Linux and macOS.

Mar 11, 2026 Local LLM 7 min read

Run Qwen2.5 Quantized GGUF on 8GB VRAM: Local Setup 2026

Run Qwen2.5 7B or 14B GGUF quantized models on 8GB VRAM using llama.cpp or Ollama. Covers Q4_K_M vs Q5_K_M tradeoffs, GPU offload layers, and inference speed.

Mar 11, 2026 Local LLM 8 min read

Run Qwen2.5-Math for Scientific Computing and LLM Reasoning 2026

Deploy Qwen2.5-Math 7B or 72B for scientific computing and LLM reasoning with CoT and Tool-Integrated Reasoning. Tested on Python 3.12 + CUDA 12.

Mar 11, 2026 LLM 10 min read

Run Qwen2.5-VL for Vision Tasks and Image Analysis 2026

Deploy Qwen2.5-VL 7B or 72B for multimodal vision tasks: image analysis, OCR, document parsing, and visual reasoning. Tested on Python 3.12 + CUDA 12.

Mar 11, 2026 LLM 7 min read

Run Spectrum Fine-Tuning: Selective Layer Training for LLMs 2026

Spectrum fine-tuning targets only high signal-to-noise layers, cutting GPU memory up to 50% while matching full fine-tune quality. Tested on Python 3.12 + CUDA 12.

Mar 11, 2026 Fine Tuning 7 min read

Setup LM Studio API Server: OpenAI-Compatible Local Endpoint 2026

Run LM Studio as an OpenAI-compatible local API server. Connect any OpenAI SDK client to local LLMs — Python, Node.js, curl. Free, no API key needed.

Mar 11, 2026 Local LLM 7 min read

Setup MCP Puppeteer Server: Browser Automation with Claude 2026

Configure MCP browser automation for Claude Desktop and Claude Code. Covers Puppeteer MCP setup, deprecation warning, and Playwright MCP migration. Node.js 18+.

Mar 11, 2026 MCP 7 min read

Setup MCP Servers in Zed Editor: AI Agent Setup 2026

Configure MCP servers in Zed editor to supercharge AI-powered coding. Add filesystem, GitHub, and custom servers via settings.json or the Agent Panel UI.

Mar 11, 2026 MCP 6 min read

Setup Windsurf IDE: First Week Tips for Maximum Productivity 2026

Configure Windsurf IDE from scratch and master Cascade AI, custom rules, and multi-file workflows in your first week. Tested on macOS and Ubuntu with Node 22.

Mar 11, 2026 AI Coding 6 min read

Setup Windsurf Memories: Teach the AI Your Codebase 2026

Configure Windsurf Memories to persist codebase context across sessions. Covers auto-generated vs manual memories, Rules files, and .windsurfrules setup. Tested on Windsurf 1.13.x.

Mar 11, 2026 AI Coding 8 min read

Setup Windsurf Remote Development: SSH and Containers 2026

Configure Windsurf remote development over SSH and Docker containers. Connect to cloud VMs, WSL, and dev containers with Cascade AI intact. Tested on Ubuntu 24.04.

Mar 11, 2026 AI Coding 8 min read

Train LLMs Full-Parameter with GaLore: Memory-Efficient Fine-Tuning 2026

GaLore cuts LLM training memory by 65% with full-parameter learning. Run LLaMA 3 on a single 24GB GPU using Python 3.12, PyTorch 2.3, and the galore-torch library.

Mar 11, 2026 Fine Tuning 8 min read

Understand Windsurf Flow: How the Context Engine Works 2026

Windsurf Flow explained: how the RAG-based context engine, Cascade, Memories, and .windsurfrules work together to keep you in flow state. Tested on Windsurf 1.x.

Mar 11, 2026 AI Coding 10 min read

Upgrade to TRL 0.12: Hugging Face Training Library New Features 2026

TRL 0.12 ships PPO rename, unified ScriptArguments, WPO for DPO, pairwise judges for Online DPO, and a new trl env CLI. Tested on Python 3.12 + CUDA 12.

Mar 11, 2026 Fine Tuning 8 min read

Use Windsurf Supercomplete: Beyond Autocomplete AI Coding 2026

Windsurf Supercomplete explained: how it predicts multi-line intent, differs from Copilot, and how to configure it for TypeScript and Python. Tested on Windsurf 1.x.

Mar 11, 2026 AI Coding 7 min read