Microsoft’s GUI-Actor enables smarter software interactions
Researchers from Microsoft, Nanjing University, and the University of Illinois Urbana-Champaign have created GUI-Actor, a system that enables GUI agents to interact with software interfaces without using coordinates, resulting in notably improved performance.
Microsoft, Nanjing University, University of Illinois Urbana-Champaign
Authors: Qianhui Wu et al.
The team introduced a novel visual grounding approach for GUI agents that removes the need for coordinate generation; by employing an attention-based action head, GUI-Actor can directly identify and interact with interface elements in a way that resembles human behavior.
This work demonstrates that precise coordinates are not strictly necessary for effective GUI interactions; instead, an agent that visually recognizes and operates on elements can achieve higher accuracy. For example, a GUI agent based on this approach could assist users in navigating complex software layouts and varying resolutions more reliably.
GUI-Actor relies on a fixed patch size within its model, which may reduce its effectiveness when dealing with very small interface elements, limiting its suitability for high-precision tasks like CAD.
GUI-Actor changes how GUI agents interact with software by allowing them to identify and select interface elements directly, making interactions more intuitive and efficient, similar to human behavior.
📄 Read the full paper: GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
……Read full article on Tech in Asia
Technology
Comments
Leave a comment in Nestia App