News

News
xAI Grok-5 preview adds native causal video generation with physics consistency
xAI Grok-5 preview adds causal video generation with physics consistency for 10-second 720p clips
2026-02-22
OpenAI o4-proto-3 achieves closed-loop self-correction on 100+ cycle software debugging tasks
OpenAI o4-proto-3 achieves closed-loop self-correction with 74.2% resolution on 100+ cycle debugging tasks
2026-02-22
Mistral Devstral: Specialized 24B Model Optimized for Developer Tool-Chain Integration
Mistral released Devstral (24B), a developer-focused model fine-tuned for seamless integration with IDEs, terminals, git
2026-02-20
Meta Llama 4 Maverick: First Open Model to Reach Human-Parity on SWE-Bench Verified
Llama 4 Maverick (open-weight 405B variant) achieves 68.7% resolution rate on SWE-Bench Verified
2026-02-20
xAI Grok-4.1 Adds Native Video Understanding and Temporal Action Localization
Grok-4.1 preview integrates dense video captioning, temporal action localization, and event grounding capabilities directly into the core model.
2026-02-20
DeepMind AlphaGeometry 2 Solves 92% of IMO Geometry Problems Autonomously
AlphaGeometry 2 reaches 92% solve rate on past 25 years of International Mathematical Olympiad geometry problems
2026-02-20
OpenAI o4-proto-2 Demonstrates Self-Evolving Prompt Engineering Loop
OpenAI released an internal preview of o4-proto-2 showing a self-evolving prompt engineering mechanism
2026-02-20
Mistral Codestral-Mamba: State-Space Model Achieves SOTA Efficiency on Long-Code Completion
4.1× faster inference than equivalent Transformer models at 256k context, while matching or exceeding pass@1 on HumanEval+, MultiPL-E, and BigCodeBench-hard.
2026-02-19
OpenAI o4-proto: First Model to Demonstrate Self-Improving Code Synthesis Loop Over Multiple Iterations
o4-proto shows closed-loop self-improvement: it writes code → runs tests → reads failures → rewrites improved version → repeats autonomously for up to 50 cycles
2026-02-19
Google DeepMind AlphaEvolve: Evolutionary Algorithm Guided by Gemini for Protein Engineering
AlphaEvolve combines Gemini multimodal reasoning with evolutionary search to design
2026-02-19
xAI Grok-4 Early Preview: Real-Time Multi-Modal World Model with Physics-Aware Forecasting
Grok-4 preview version integrates a native world model that jointly reasons over vision, language
2026-02-19
Anthropic Claude 4.3 Sonnet Introduces Native Hierarchical Task Decomposition for Long-Horizon Agents
Claude 4.3 Sonnet adds built-in hierarchical task decomposition
2026-02-19