WebVoyager is an end-to-end web agent published as a research paper (arXiv 2401.13919) by researchers at Zhejiang University. It uses large multimodal models to interact with real websites by processing rendered screenshots and HTML, executing clicks, form fills, and navigation to complete user tasks. It achieved a 59.1% task success rate on a 643-task benchmark spanning 15 real websites. The code is available on GitHub; it is a research artifact, not a commercial product. Key features: - Multimodal LLM agent that reads rendered webpage screenshots - Executes real browser actions: clicks, typing, form submission, scrolling - Benchmark of 643 semi-automated tasks across 15 real-world websites - Automatic evaluation with 85.3% agreement with human judges - Open-source code released on GitHub (MinorJerry/WebVoyager)
Free and open-source research artifact. No commercial product or pricing.
