Musk’s xAI debuts Grok 4
Elon Musk’s AI startup, xAI, unveiled its latest model, Grok 4, during a livestream on July 9, 2025 night.
Grok 4 was described as an advanced AI assistant.
xAI employees discussed its performance on “Humanity’s Last Exam,” which includes over 2,500 questions from various subjects.
The company reported that Grok 4 answered approximately 25% of text-based questions without using additional tools.
This result aligns with earlier benchmarks set by OpenAI.
.source-ref{font-size:0.85em;color:#666;display:block;margin-top:1em;}a.ask-tia-citation-link:hover{color:#11628d !important;background:#e9f6f5 !important;border-color:#11628d !important;text-decoration:none !important;}@media only screen and (min-width:768px){a.ask-tia-citation-link{font-size:11px !important;}}🔗 Source: The Verge
Elon Musk’s current position as the leader of xAI marks a significant evolution in his relationship with artificial intelligence, which has historically been characterized by public warnings about its dangers.
In 2015, Musk co-founded OpenAI as a non-profit organization specifically to ensure safe AI development, expressing concerns about the technology’s potential risks to humanity 1.
His cautionary stance continued for years, with Musk publicly comparing AI’s dangers to those of nuclear weapons and advocating for regulatory oversight to prevent potential harm 2.
Despite these warnings, Musk has simultaneously deepened his involvement in AI development, first through investments in companies like Vicarious 2, then through Tesla’s autonomous driving technology, and now with xAI and Grok.
This apparent contradiction was visible in Wednesday’s livestream when Musk acknowledged being “at times kind of worried” about superintelligent AI while simultaneously promoting Grok 4 as “the smartest AI in the world.”
The tension in Musk’s approach reflects the broader challenge facing the industry: balancing rapid innovation with responsible development and ethical safeguards.
The emphasis on Grok 4’s performance on “Humanity’s Last Exam” highlights how benchmark testing has become a crucial battleground for AI companies seeking to demonstrate superiority.
AI benchmarking has evolved into a sophisticated practice, with models now evaluated across diverse categories including commerce queries, predictive capabilities, and complex reasoning—areas where both ChatGPT and Grok scored highly (8.75 out of 10) in recent comparative testing 3.
These technical comparisons have real business implications, as shown by the significant investments flowing to companies with leading benchmark results—Microsoft invested $1 billion in OpenAI 1.
However, benchmark results often don’t translate directly to real-world performance, as demonstrated by Grok’s recent controversies with antisemitic outputs despite strong technical metrics.
The focus on numerical comparisons also obscures important differences in model design philosophies, with Claude emphasizing structured reasoning, Grok prioritizing “truth-seeking,” and GPT models balancing versatility with safeguards 4.
This competitive landscape is pushing companies to differentiate their models beyond raw performance metrics, focusing on specific strengths like Claude’s detailed analysis capabilities or Grok’s conversational engagement 5.
Musk’s declaration that Grok 4 might “discover new physics next year” follows his established pattern of making extraordinarily ambitious claims across his various technology ventures.
Throughout his career leading companies like SpaceX, Tesla, and Neuralink, Musk has consistently set aggressive timelines and made bold predictions—from colonizing Mars “within our lifetime” 6 to achieving fully autonomous driving, often with deadlines that later require adjustment.
This approach has proven effective in attracting investment and talent—SpaceX secured a $1.6 billion NASA contract 7 and Tesla grew to a $600 billion market value 8 despite initial skepticism about both ventures.
The pattern continues with xAI, where Musk’s claims about Grok discovering “new technologies that are actually useful no later than next year” mirror his ambitious statements about other ventures.
While critics point to the gap between promises and delivery, Musk’s companies have consistently achieved significant innovations even when falling short of their most ambitious goals—the Falcon 9’s successful landing revolutionized space launch economics despite delays 7.
This tension between visionary statements and practical realities reflects a leadership approach that uses ambitious goals to motivate teams and capture public imagination, though it also creates challenges for evaluating realistic timelines and capabilities.
Read full article on Tech in Asia
Technology Business
Comments
Leave a comment in Nestia App