About Google Cloud OCR
Google Cloud Vision API provides two OCR modes: TEXT_DETECTION for short text in natural scenes (signs, labels, product packaging) and DOCUMENT_TEXT_DETECTION for dense printed or handwritten documents with paragraph-level structure and bounding boxes. It is billed per image or per page for PDFs. Beyond OCR, the Vision API also covers label detection, face detection, logo detection, object localization, and web detection in a single unified API. New customers receive $300 in free credits.
AI Agent Use Cases
- TEXT_DETECTION mode for scene text (signs, menus, labels)
- DOCUMENT_TEXT_DETECTION for dense documents with paragraph and bounding box structure
- Handwriting recognition support
- PDF and TIFF multi-page document processing (each page billed as one image)
- Returns structured output including word positions and confidence scores
- 1,000 free units per month on most features
Available Actions
These are the specific actions that AI agents can perform with this tool