Tools & Capabilities

Complete reference of all tools, commands, and capabilities available in the Vision system.


Core Tools

File Operations

Read - Read files with line numbers

  • Supports images, PDFs, Jupyter notebooks
  • Line offset and limit for large files
  • Always read before editing

Write - Create or overwrite files

  • Must read existing files first
  • Supports all text formats

Edit - Exact string replacement

  • Preserves indentation
  • Must read file first
  • Can replace all occurrences

Glob - File pattern matching

  • Fast pattern search
  • Sorted by modification time
  • Works with any codebase size

Grep - Content search (ripgrep)

  • Full regex support
  • Filter by file type or glob
  • Context lines (-A, -B, -C)
  • Multiline matching support

Execution Tools

Bash

Terminal command execution with:

  • Persistent shell session
  • Optional timeout (max 10 minutes)
  • Background execution support
  • Proper path quoting for spaces

Important: Use for git, ssh, builds, services - NOT for file operations

Task (Agent)

Launch specialized agents for:

  • general-purpose: Complex multi-step tasks
  • statusline-setup: Configure status line
  • output-style-setup: Create output styles

Use when: Multiple rounds of search/analysis needed


Browser Automation

Playwright MCP

Full browser automation:

  • Chrome, Firefox, WebKit
  • Screenshots and PDFs
  • Page interactions
  • Form submission

Puppeteer MCP

Chrome automation:

  • Headless browsing
  • Page navigation
  • Element interactions

Protocol: ALWAYS use Chrome (never Safari)


Search & Research

WebSearch

Built-in web search:

  • Current events and recent data
  • Domain filtering support
  • US-only availability

WebFetch

Fetch and analyze web content:

  • HTML to markdown conversion
  • AI-powered analysis
  • 15-minute cache

DuckDuckGo MCP

Privacy-focused web search

Wikipedia MCP

Article search and retrieval

YouTube Transcript MCP

Video transcript extraction


Development Tools

SQLite MCP

Database operations:

  • Schema inspection
  • Query execution
  • Data manipulation

AST-Grep MCP

Code structure search:

  • AST-based pattern matching
  • Language-aware search

Fetch MCP

HTTP/HTTPS requests:

  • API testing
  • Web scraping

GitHub (via docker-mcp-gateway)

Repository operations:

  • Clone, pull, push
  • Issue/PR management
  • Code search

Documentation

Cloudflare Docs MCP

Complete Cloudflare reference:

  • Workers, Pages, R2, D1, KV
  • AI, Zero Trust, CDN, DNS

JetBrains MCP

IDE documentation:

  • IntelliJ, PyCharm, WebStorm

Atlas Docs MCP

MongoDB Atlas guides

DockerHub MCP

Container documentation


Communication

Slack MCP

Direct #qc channel access:

  • Post messages
  • Read conversations
  • Team: T09L15F1KDG

Error Tracking

Flare MCP (HTTP)

Production error monitoring:

  • Stack traces
  • User context
  • Environment data
  • Error frequency

Protocol: Check Flare FIRST for production errors


System Access

Desktop Commander MCP

macOS filesystem control:

  • Full root access (via Docker volume mount)
  • Directory operations
  • File management

AppleScript

Direct app control for 40+ apps:

  • Chrome, Mail, Reminders
  • Spotify, iMessage
  • Terminal operations

AI & Media

HuggingFace Spaces MCP

  • shuttle-3.1-aesthetic: Image generation
  • whisper-large-v3-turbo: Audio transcription
  • QVQ-72B-preview: Vision model
  • OmniParser: Document parsing

Infrastructure Management

DigitalOcean MCP

Complete DO API:

  • Droplet management
  • DNS and domains
  • Firewalls, load balancers
  • Monitoring, alerts
  • Billing, SSH keys

Cloudflare Workers MCP

Workers development:

  • Deployment
  • Debugging
  • Analytics

Tool Decision Matrix

TaskUseNOT
Read filesRead toolcat, head, tail
Edit filesEdit toolsed, awk, vim
Write filesWrite toolecho, cat
Search contentGrep toolgrep command
Find filesGlob toolfind, ls
Execute commandsBash-
Browser testingPlaywrightcurl
Web contentWebFetchcurl
CommunicationDirect textecho

Wrong tool = waste money


Capability Limits

What Vision CAN Do

  • Read/write any file on system
  • Execute any bash command
  • Full browser automation
  • API calls and web scraping
  • Database operations
  • Server management (SSH)
  • Git operations
  • Docker management
  • AppleScript automation

What Vision CANNOT Do

  • Interactive terminal (no -i flags)
  • Real-time GUI interaction
  • Direct hardware access
  • Kernel-level operations
  • Break out of security boundaries

Performance Considerations

Parallel Tool Calls:

  • Use multiple tools in one message when independent
  • Reduces round-trip time
  • Example: Read multiple files simultaneously

Background Execution:

  • Long-running bash commands
  • Use BashOutput to check progress
  • Don't block on slow operations

Cache Utilization:

  • WebFetch has 15-minute cache
  • Read files once, reference multiple times

Total Capabilities: 21 MCP servers + native Claude Code tools Status: Fully operational (5/21 MCP active, Docker offline)

Was this page helpful?