PUBG maker Krafton, Nvidia launch benchmark for AI game agents

PUBG maker Krafton, Nvidia launch benchmark for AI game agents

Tech in Asia·2025-06-07 17:00

🔍 In one sentence

Researchers developed Orak, a benchmark to train and evaluate Large Language Model (LLM) agents across multiple popular video games, improving their interaction and performance.

🏛️ Paper by:

Krafton, Seoul National University, Nvidia, University of Wisconsin-Madison.

Authors: Dongmin Park et al.

🧠 Key discovery

Orak addresses gaps in existing game evaluation methods by offering a comprehensive platform that tests LLMs in complex gameplay across twelve popular video games, unlike prior benchmarks that mostly focused on simpler text-based games.

📊 Surprising results

Key stat: Orak covers 12 diverse games, providing a broader evaluation of LLM capabilities than earlier benchmarks. Breakthrough: The plug-and-play Model Context Protocol (MCP) enables LLMs to interact directly with game environments, improving evaluation consistency. Comparison: Proprietary LLMs like GPT-4o outperformed open-source models, revealing a performance gap in complex game interactions.

📌 Why this matters

This research shows that effective LLM evaluation requires complex, realistic environments rather than simplistic ones, as seen in gaming applications where LLMs can improve NPC intelligence and dynamic narratives, enhancing player engagement.

💡 What are the potential applications?

Creating adaptive NPCs that respond to player strategies in real time. Supporting AI-driven storytelling with dynamic character responses to gameplay. Assisting game designers in testing and refining mechanics through LLM-simulated player interactions.

⚠️ Limitations

High computational demands for training and running these models may limit access for smaller developers or indie projects.

👉 Bottom line:

Orak advances the evaluation of LLMs in gaming, enabling more interactive and responsive AI experiences in the industry.

📄 Read the full paper: Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games

……

Read full article on Tech in Asia

Games