Microsoft, University of Chicago’s AI learns better from feedback

Microsoft, University of Chicago’s AI learns better from feedback

Tech in Asia·2025-05-30 17:00

🔍 In one sentence TEXT2GRAD affects how language models learn by transforming natural language feedback into precise, actionable learning signals.

🏛️ Paper by: University of Chicago, Microsoft, Fudan University

Authors: Hanyang Wang et al.

🧠 Key discovery The researchers discovered that using natural language feedback to create span-level gradients could enhance the learning process of language models, providing more targeted adjustments than traditional scalar reward methods.

Conventional methods often overlook the nuanced feedback that can be derived from textual critiques.

📊 Surprising results

Key stat: TEXT2GRAD outperformed scalar-reward reinforcement learning (RL) methods by achieving higher accuracy and better interpretability across tasks like summarization and code generation. Breakthrough: The introduction of a mechanism that aligns feedback phrases with specific token spans allows for finer adjustments in the model’s learning process. Comparison: TEXT2GRAD achieved a 25.3% improvement in BLEU score over the best existing methods, a substantial leap in performance.

📌 Why this matters This research challenges the established notion that simpler, scalar feedback is adequate for training language models.

By illustrating the effectiveness of rich, detailed feedback, it paves the way for more sophisticated models that can learn in a manner closer to human reasoning.

For instance, in customer service applications, models may receive specific feedback on responses, allowing them to improve more accurately and meaningfully.

💡 What are the potential applications?

Enhanced natural language processing tasks, such as summarization and question answering, resulting in more coherent outputs. Development of intelligent virtual assistants that may learn from user interactions and feedback to provide better service over time. Implementation in coding applications, where models may learn from feedback on code snippets to produce more efficient and accurate programming solutions.

⚠️ Limitations The model’s performance heavily depends on the quality of the feedback it receives.

Inaccurate or poorly structured feedback may hinder the learning process, indicating a need for high-quality annotations in real-world applications.

👉 Bottom line: TEXT2GRAD shows that turning natural language critiques into actionable learning signals can dramatically improve how AI learns, making it more capable of understanding and adapting to human feedback.

 

📄 Read the full paper: Text2Grad: Reinforcement Learning from Natural Language Feedback 

……

Read full article on Tech in Asia

Technology