Microsoft’s GUI-Actor enables smarter software interactions

Microsoft’s GUI-Actor enables smarter software interactions

Tech in Asia·2025-06-05 17:00

🔍 In one sentence

Researchers from Microsoft, Nanjing University, and the University of Illinois Urbana-Champaign have created GUI-Actor, a system that enables GUI agents to interact with software interfaces without using coordinates, resulting in notably improved performance.

🏛️ Paper by:

Microsoft, Nanjing University, University of Illinois Urbana-Champaign

Authors: Qianhui Wu et al.

🧠 Key discovery

The team introduced a novel visual grounding approach for GUI agents that removes the need for coordinate generation; by employing an attention-based action head, GUI-Actor can directly identify and interact with interface elements in a way that resembles human behavior.

📊 Surprising results

Key stat: GUI-Actor scored 44.6 on ScreenSpot-Pro, outperforming the previous best, UI-TARS-72B, which scored 38.1. Breakthrough: Adding a dedicated token allows the model to focus on relevant visual patches and propose multiple actions in a single forward pass. Comparison: Though it uses fewer parameters and less training data than prior methods, GUI-Actor achieves better generalization to unseen screen resolutions.

📌 Why this matters

This work demonstrates that precise coordinates are not strictly necessary for effective GUI interactions; instead, an agent that visually recognizes and operates on elements can achieve higher accuracy. For example, a GUI agent based on this approach could assist users in navigating complex software layouts and varying resolutions more reliably.

💡 What are the potential applications?

Enhanced User Interfaces: Integrating GUI-Actor into applications could improve accessibility for users with disabilities. Automated Customer Service: It could enable virtual assistants to directly interact with graphical interfaces when assisting customers. Robotic Process Automation: GUI-Actor can help automate tasks across different software tools without requiring extensive custom coding.

⚠️ Limitations

GUI-Actor relies on a fixed patch size within its model, which may reduce its effectiveness when dealing with very small interface elements, limiting its suitability for high-precision tasks like CAD.

👉 Bottom line:

GUI-Actor changes how GUI agents interact with software by allowing them to identify and select interface elements directly, making interactions more intuitive and efficient, similar to human behavior.

📄 Read the full paper: GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

……

Read full article on Tech in Asia

Technology