AI Pulse

AI News: Apr 25, 2026

Today's 5 most important AI stories — product launches, research, funding, and more.

FundingTrendProduct LaunchProduct LaunchResearch
1Funding
■■■■□ 4/5

Google Commits Up to $40B in Anthropic Investment

Google plans to invest up to $40 billion in Anthropic through cash and compute resources, deepening its stake in the AI startup. The move follows Anthropic's limited release of Mythos, a cybersecurity-focused model, as tech giants race to lock in compute capacity and strategic AI partnerships.

Why it matters

A $40B commitment signals that compute access and AI alliances are now existential priorities for Big Tech, reshaping competitive dynamics across the entire industry.

2Trend
■■■■□ 4/5

Meta Buys Millions of Amazon's Custom AI CPUs

Meta has signed a deal to acquire millions of Amazon's proprietary CPUs — not GPUs — for AI agentic workloads. The move signals a strategic shift in AI infrastructure, suggesting that specialized CPUs may rival GPUs for certain next-generation AI tasks. Financial terms were not disclosed.

Why it matters

This deal signals that the AI chip landscape is diversifying beyond GPU dominance, with CPUs emerging as viable infrastructure for agentic AI — a shift that could reshape procurement strategies across

3Product Launch
■■□□□ 2/5

OpenAI Launches GPT-5.5 in Push Toward AI Super App

OpenAI released GPT-5.5 on April 23, 2026, touting improved capabilities across multiple categories. The release marks another step in the company's stated goal of building a comprehensive AI super app. OpenAI has not specified which benchmarks or tasks saw the largest gains.

Why it matters

Each incremental model release tightens OpenAI's grip on the AI platform layer, raising the stakes for enterprise tool choices and competitor positioning.

4Product Launch
■■□□□ 2/5

OpenAI Launches GPT-5.5 With Stronger Coding Capabilities

OpenAI released GPT-5.5, one month after GPT-5.4, describing it as its most capable and intuitive model to date. The company says the model excels at writing and debugging code and positions it as a step toward new computer-based workflows. No pricing or availability details were immediately disclosed.

Why it matters

Rapid model iteration from OpenAI signals accelerating competition in AI coding tools, directly affecting developer productivity decisions and enterprise software procurement.

5Research
■■■■□ 4/5

DeepMind Spinoff Isomorphic Labs Advances AI Drugs to Human Trials

Isomorphic Labs, a DeepMind spinoff, announced its AI-designed drug candidates are advancing to human clinical trials. President Max Jaderberg revealed the development at WIRED Health in London, describing a broad pipeline of new medicines. The company uses AI to accelerate drug discovery, potentially compressing timelines that traditionally take years.

Why it matters

AI-designed compounds entering human trials marks a critical validation milestone that could reshape pharmaceutical R&D timelines and investment strategies across the industry.

181 More Stories Today

Product Launch

Introducing GPT-5.5

Introducing GPT-5.5, our smartest model yet—faster, more capable, and built for complex tasks like coding, research, and data analysis across tools.

OpenAI Blog

Funding

Tesla discloses $2B AI hardware company acquisition in filing

Article URL: https://electrek.co/2026/04/23/tesla-tsla-quietly-discloses-2-billion-ai-hardware-acquisition-10q/ Comments URL: https://news.ycombinator.com/item?id=47892765 Points: 68 # Comments: 45

Hacker News

Funding

ComfyUI hits $500M valuation as creators seek more control over AI-generated media

ComfyUI, whose tools give creators more control over AI image, video, and audio generation, just raised $30 million.

TechCrunch AI

Research

DeepSeek previews new AI model that ‘closes the gap’ with frontier models

DeepSeek says both models are more efficient and performant than DeepSeek V3.2 due to architectural improvements, and have almost "closed the gap" with current leading models, both open and closed, on reasoning benchmarks.

TechCrunch AI

Research

EvoForest: A Novel Machine-Learning Paradigm via Open-Ended Evolution of Computational Graphs

arXiv:2604.19761v1 Announce Type: new Abstract: Modern machine learning is still largely organized around a single recipe: choose a parameterized model family and optimize its weights. Although highly successful, this paradigm is too narrow for many structured prediction problems, where the main bottleneck is not parameter fitting but discovering what should be computed from the data. Success often depends on identifying the right transformations, statistics, invariances, interaction structures,

ArXiv AI

AI Agents

Cognis: Context-Aware Memory for Conversational AI Agents

arXiv:2604.19771v1 Announce Type: cross Abstract: LLM agents lack persistent memory, causing conversations to reset each session and preventing personalization over time. We present Lyzr Cognis, a unified memory architecture for conversational AI agents that addresses this limitation through a multi-stage retrieval pipeline. Cognis combines a dual-store backend pairing OpenSearch BM25 keyword matching with Matryoshka vector similarity search, fused via Reciprocal Rank Fusion. Its context-aware i

ArXiv AI

AI Agents

From Data to Theory: Autonomous Large Language Model Agents for Materials Science

arXiv:2604.19789v1 Announce Type: new Abstract: We present an autonomous large language model (LLM) agent for end-to-end, data-driven materials theory development. The model can choose an equation form, generate and run its own code, and test how well the theory matches the data without human intervention. The framework combines step-by-step reasoning with expert-supplied tools, allowing the agent to adjust its approach as needed while keeping a clear record of its decisions. For well-establishe

ArXiv AI

AI Agents

Stateless Decision Memory for Enterprise AI Agents

arXiv:2604.20158v1 Announce Type: new Abstract: Enterprise deployment of long-horizon decision agents in regulated domains (underwriting, claims adjudication, tax examination) is dominated by retrieval-augmented pipelines despite a decade of increasingly sophisticated stateful memory architectures. We argue this reflects a hidden requirement: regulated deployment is load-bearing on four systems properties (deterministic replay, auditable rationale, multi-tenant isolation, statelessness for horiz

ArXiv AI

Research

Large Language Models Outperform Humans in Fraud Detection and Resistance to Motivated Investor Pressure

arXiv:2604.20652v2 Announce Type: new Abstract: Large language models trained on human feedback may suppress fraud warnings when investors arrive already persuaded of a fraudulent opportunity. We tested this in a preregistered experiment across seven leading LLMs and twelve investment scenarios covering legitimate, high-risk, and objectively fraudulent opportunities, combining 3,360 AI advisory conversations with a 1,201-participant human benchmark. Contrary to predictions, motivated investor fr

ArXiv AI

AI Tools

Accelerating PayPal's Commerce Agent with Speculative Decoding: An Empirical Study on EAGLE3 with Fine-Tuned Nemotron Models

arXiv:2604.19767v1 Announce Type: cross Abstract: We evaluate speculative decoding with EAGLE3 as an inference-time optimization for PayPal's Commerce Agent, powered by a fine-tuned llama3.1-nemotron-nano-8B-v1 model. Building on prior work (NEMO-4-PAYPAL) that reduced latency and cost through domain-specific fine-tuning, we benchmark EAGLE3 via vLLM against NVIDIA NIM on identical 2xH100 hardware across 40 configurations spanning speculative token counts (gamma=3, gamma=5), concurrency levels (

ArXiv AI

AI Tools

TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference

arXiv:2604.19769v1 Announce Type: cross Abstract: Key-value (KV) caching is critical for efficient inference in large language models (LLMs), yet its memory footprint scales linearly with context length, resulting in a severe scalability bottleneck. Existing approaches largely treat KV states as equally important across time, implicitly assuming uniform precision and accessibility. However, this assumption contrasts with human memory systems, where memories vary in clarity, recall frequency, and

ArXiv AI

Industry

Phase 1 Implementation of LLM-generated Discharge Summaries showing high Adoption in a Dutch Academic Hospital

arXiv:2604.19774v1 Announce Type: cross Abstract: Writing discharge summaries to transfer medical information is an important but time-consuming process that can be assisted by Large Language Models (LLMs). This prospective mixed methods pilot study evaluated an Electronic Health Record (EHR)-integrated LLM to generate discharge summaries drafts. In total, 379 discharge summaries were generated in clinical practice by 21 residents and 4 physician assistants during 9 weeks in our academic hospita

ArXiv AI

Policy

Model Capability Assessment and Safeguards for Biological Weaponization

arXiv:2604.19811v2 Announce Type: cross Abstract: AI leaders and safety reports increasingly warn that advances in model reasoning may enable biological misuse, including by low-expertise users, while major labs describe safeguards as expanding but still evolving rather than settled. This study benchmarks ChatGPT 5.2 Auto, Gemini 3 Pro Thinking, Claude Opus 4.5 and Meta's Muse Spark Thinking on 73 novice-framed, open-ended benign STEM prompts to measure operational intelligence. On benign quanti

ArXiv AI

Research

Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts

arXiv:2604.19835v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) has become the dominant architecture for scaling large language models: frontier models routinely decouple total parameters from per-token computation through sparse expert routing. Scaling laws show that under fixed active computation, model quality scales predictably with total parameters, and MoEs realize this by increasing expert count. However, training large MoEs is expensive, as memory requirements and inter-device

ArXiv AI

Research

From Signal Degradation to Computation Collapse: Uncovering the Two Failure Modes of LLM Quantization

arXiv:2604.19884v1 Announce Type: cross Abstract: Post-Training Quantization (PTQ) is critical for the efficient deployment of Large Language Models (LLMs). While 4-bit quantization is widely regarded as an optimal trade-off, reducing the precision to 2-bit usually triggers a catastrophic ``performance cliff.'' It remains unclear whether the underlying mechanisms differ fundamentally. Consequently, we conduct a systematic mechanistic analysis, revealing two qualitatively distinct failure modes:

ArXiv AI

Research

EmbodiedMidtrain: Bridging the Gap between Vision-Language Models and Vision-Language-Action Models via Mid-training

arXiv:2604.20012v1 Announce Type: cross Abstract: Vision-Language-Action Models (VLAs) inherit their visual and linguistic capabilities from Vision-Language Models (VLMs), yet most VLAs are built from off-the-shelf VLMs that are not adapted to the embodied domain, limiting their downstream performance. In this work, we propose EmbodiedMidtrain to bridge the gap between VLMs and VLAs. We first characterize the data distribution gap between them, showing that VLA data occupy compact regions that a

ArXiv AI

Research

Auditing and Controlling AI Agent Actions in Spreadsheets

arXiv:2604.20070v1 Announce Type: cross Abstract: Advances in AI agent capabilities have outpaced users' ability to meaningfully oversee their execution. AI agents can perform sophisticated, multi-step knowledge work autonomously from start to finish, yet this process remains effectively inaccessible during execution, often buried within large volumes of intermediate reasoning and outputs: by the time users receive the output, all underlying decisions have already been made without their involve

ArXiv AI

AI Agents

SkillGraph: Graph Foundation Priors for LLM Agent Tool Sequence Recommendation

arXiv:2604.19793v1 Announce Type: new Abstract: LLM agents must select tools from large API libraries and order them correctly. Existing methods use semantic similarity for both retrieval and ordering, but ordering depends on inter-tool data dependencies that are absent from tool descriptions. As a result, semantic-only methods can produce negative Kendall-$\tau$ in structured workflow domains. We introduce SkillGraph, a directed weighted execution-transition graph mined from 49,831 successful L

ArXiv AI

Research

MIRROR: A Hierarchical Benchmark for Metacognitive Calibration in Large Language Models

arXiv:2604.19809v1 Announce Type: new Abstract: We introduce MIRROR, a benchmark comprising eight experiments across four metacognitive levels that evaluates whether large language models can use self-knowledge to make better decisions. We evaluate 16 models from 8 labs across approximately 250,000 evaluation instances using five independent behavioral measurement channels. Core experiments are run across the full model roster; experiments with specialized infrastructure requirements report expl

ArXiv AI

Research

Large Language Models Meet Biomedical Knowledge Graphs for Mechanistically Grounded Therapeutic Prioritization

arXiv:2604.19815v1 Announce Type: new Abstract: Drug repurposing is often framed as a candidate identification task, but existing approaches provide limited guidance for distinguishing biologically plausible candidates from historically well-connected ones. Here we introduce DrugKLM, a hybrid framework that integrates biomedical knowledge graph structure with large language model-based mechanistic reasoning to enable mechanistically grounded therapeutic prioritization. Across benchmark datasets,

ArXiv AI

AI Agents

JTPRO: A Joint Tool-Prompt Reflective Optimization Framework for Language Agents

arXiv:2604.19821v1 Announce Type: new Abstract: Large language model (LLM) agents augmented with external tools often struggle as number of tools grow large and become domain-specific. In such settings, ambiguous tool descriptions and under-specified agent instructions frequently lead to tool mis-selection and incorrect slot/value instantiation. We hypothesize that this is due to two root causes: generic, one-size-fits-all prompts that ignore tool-specific nuances, and underspecified tool schema

ArXiv AI

Research

Learning When Not to Decide: A Framework for Overcoming Factual Presumptuousness in AI Adjudication

arXiv:2604.19895v1 Announce Type: new Abstract: A well-known limitation of AI systems is presumptuousness: the tendency of AI systems to provide confident answers when information may be lacking. This challenge is particularly acute in legal applications, where a core task for attorneys, judges, and administrators is to determine whether evidence is sufficient to reach a conclusion. We study this problem in the important setting of unemployment insurance adjudication, which has seen rapid integr

ArXiv AI

Research

Separable Pathways for Causal Reasoning: How Architectural Scaffolding Enables Hypothesis-Space Restructuring in LLM Agents

arXiv:2604.20039v1 Announce Type: new Abstract: Causal discovery through experimentation and intervention is fundamental to robust problem solving. It requires not just updating beliefs within a fixed framework but revising the hypothesis space itself, a capacity current AI agents lack when evidence demands representations they have not previously constructed. We extend the blicket detector paradigm from developmental science to test this capacity in AI agents equipped with architectural scaffol

ArXiv AI

Industry

From Fuzzy to Formal: Scaling Hospital Quality Improvement with AI

arXiv:2604.20055v1 Announce Type: new Abstract: Hospital Quality Improvement (QI) plays a critical role in optimizing healthcare delivery by translating high-level hospital goals into actionable solutions. A critical step of QI is to identify the key modifiable contributing factors, a process we call QI factor discovery, typically through expert-driven semi-structured qualitative tools like fishbone diagrams, chart reviews, and Lean Healthcare methods. AI has the potential to transform and accel

ArXiv AI

Policy

Musk vs. Altman is here, and it’s going to get messy

Elon Musk cofounded OpenAI, and then flounced off in a huff when he wasn't anointed CEO, leaving Sam Altman as the last power-hungry man standing. Now, Musk is back with a lawsuit, and a trial is scheduled to start in Oakland, California, on April 27th. Theoretically, it's a legal case about whether OpenAI defrauded Musk. […]

The Verge AI

Open Source

China’s DeepSeek previews new AI model a year after jolting US rivals

Chinese AI company DeepSeek released a preview of its hotly anticipated next-generation AI model V4 on Friday, saying that the open-source model can compete with leading closed-source systems from US rivals including Anthropic, Google, and OpenAI. DeepSeek says V4 marks a major improvement over prior models, especially in coding, a capability that has become central […]

The Verge AI

Open Source

Three reasons why DeepSeek’s new model matters

On Friday, Chinese AI firm DeepSeek released a preview of V4, its long-awaited new flagship model. Notably, the model can process much longer prompts than its last generation, thanks to a new design that helps it handle large amounts of text more efficiently. Like DeepSeek’s previous models, V4 is open source, meaning it is available…

MIT Technology Review

Research

In a first, a ransomware family is confirmed to be quantum-safe

Technically speaking, there's no practical benefit to use PQC. So why is it being used?

Ars Technica AI

Open Source

DeepSeek-V4: a million-token context that agents can actually use

Hugging Face Blog

AI Tools

I cancelled Claude: Token issues, declining quality, and poor support

Article URL: https://nickyreinert.de/en/2026/2026-04-24-claude-critics/ Comments URL: https://news.ycombinator.com/item?id=47892019 Points: 739 # Comments: 440

Hacker News

AI Agents

AgentSOC: A Multi-Layer Agentic AI Framework for Security Operations Automation

arXiv:2604.20134v1 Announce Type: cross Abstract: Security Operations Centers (SOCs) increasingly encounter difficulties in correlating heterogeneous alerts, interpreting multi-stage attack progressions, and selecting safe and effective response actions. This study introduces AgentSOC, a multi-layered agentic AI framework that enhances SOC automation by integrating perception, anticipatory reasoning, and risk-based action planning. The proposed architecture consolidates several layers of abstrac

ArXiv AI

AI Tools

Meta-Tool: Efficient Few-Shot Tool Adaptation for Small Language Models

arXiv:2604.20148v1 Announce Type: cross Abstract: Can small language models achieve strong tool-use performance without complex adaptation mechanisms? This paper investigates this question through Meta-Tool, a controlled empirical study comparing hypernetwork-based LoRA adaptation against carefully designed few-shot prompting. Using a Llama-3.2-3B-Instruct backbone, we evaluate four adaptation mechanisms--few-shot prompting, documentation encoding, hypernetwork-generated LoRA weights, and value-

ArXiv AI

AI Agents

Taint-Style Vulnerability Detection and Confirmation for Node.js Packages Using LLM Agent Reasoning

arXiv:2604.20179v1 Announce Type: cross Abstract: The rapidly evolving Node$.$js ecosystem currently includes millions of packages and is a critical part of modern software supply chains, making vulnerability detection of Node$.$js packages increasingly important. However, traditional program analysis struggles in this setting because of dynamic JavaScript features and the large number of package dependencies. Recent advances in large language models (LLMs) and the emerging paradigm of LLM-based

ArXiv AI

Industry

Cortex 2.0: Grounding World Models in Real-World Industrial Deployment

arXiv:2604.20246v1 Announce Type: cross Abstract: Industrial robotic manipulation demands reliable long-horizon execution across embodiments, tasks, and changing object distributions. While Vision-Language-Action models have demonstrated strong generalization, they remain fundamentally reactive. By optimizing the next action given the current observation without evaluating potential futures, they are brittle to the compounding failure modes of long-horizon tasks. Cortex 2.0 shifts from reactive

ArXiv AI

AI Agents

AgentLens: Adaptive Visual Modalities for Human-Agent Interaction in Mobile GUI Agents

arXiv:2604.20279v2 Announce Type: cross Abstract: Mobile GUI agents can automate smartphone tasks by interacting directly with app interfaces, but how they should communicate with users during execution remains underexplored. Existing systems rely on two extremes: foreground execution, which maximizes transparency but prevents multitasking, and background execution, which supports multitasking but provides little visual awareness. Through iterative formative studies, we found that users prefer a

ArXiv AI

Research

Transparent Screening for LLM Inference and Training Impacts

arXiv:2604.19757v1 Announce Type: cross Abstract: This paper presents a transparent screening framework for estimating inference and training impacts of current large language models under limited observability. The framework converts natural-language application descriptions into bounded environmental estimates and supports a comparative online observatory of current market models. Rather than claiming direct measurement for opaque proprietary services, it provides an auditable, source-linked p

ArXiv AI

Trend

Marked-up Mac minis flood eBay amid shortages driven by AI

Apple’s sold-out Mac mini is spawning marked-up eBay listings as demand surges for the compact desktop, now favored for running local AI models and tools.

TechCrunch AI

Funding

Bret Taylor’s Sierra buys YC-backed AI startup Fragment

Sierra, the AI customer service agent startup founded by technologist Bret Taylor, announced today that it has acquired the YC-backed French startup Fragment.

TechCrunch AI

Funding

Era raises $11M to build a software platform for AI gadgets

Era thinks that we will see many form factors of AI hardware, including glasses, rings, and pendants.

TechCrunch AI

Product Launch

An update on recent Claude Code quality reports

Article URL: https://www.anthropic.com/engineering/april-23-postmortem Comments URL: https://news.ycombinator.com/item?id=47878905 Points: 912 # Comments: 687

Hacker News

Research

The Tool-Overuse Illusion: Why Does LLM Prefer External Tools over Internal Knowledge?

arXiv:2604.19749v1 Announce Type: new Abstract: Equipping LLMs with external tools effectively addresses internal reasoning limitations. However, it introduces a critical yet under-explored phenomenon: tool overuse, the unnecessary tool-use during reasoning. In this paper, we first reveal this phenomenon is pervasive across diverse LLMs. We then experimentally elucidate its underlying mechanisms through two key lenses: (1) First, by analyzing tool-use behavior across different internal knowledge

ArXiv AI

Research

Algorithm Selection with Zero Domain Knowledge via Text Embeddings

arXiv:2604.19753v1 Announce Type: new Abstract: We propose a feature-free approach to algorithm selection that replaces hand-crafted instance features with pretrained text embeddings. Our method, ZeroFolio, proceeds in three steps: it reads the raw instance file as plain text, embeds it with a pretrained embedding model, and selects an algorithm via weighted k-nearest neighbors. The key to our approach is the observation that pretrained embeddings produce representations that distinguish problem

ArXiv AI

Research

Explainable AML Triage with LLMs: Evidence Retrieval and Counterfactual Checks

arXiv:2604.19755v1 Announce Type: new Abstract: Anti-money laundering (AML) transaction monitoring generates large volumes of alerts that must be rapidly triaged by investigators under strict audit and governance constraints. While large language models (LLMs) can summarize heterogeneous evidence and draft rationales, unconstrained generation is risky in regulated workflows due to hallucinations, weak provenance, and explanations that are not faithful to the underlying decision. We propose an ex

ArXiv AI

Research

ThermoQA: A Three-Tier Benchmark for Evaluating Thermodynamic Reasoning in Large Language Models

arXiv:2604.19758v1 Announce Type: new Abstract: We present ThermoQA, a benchmark of 293 open-ended engineering thermodynamics problems in three tiers: property lookups (110 Q), component analysis (101 Q), and full cycle analysis (82 Q). Ground truth is computed programmatically from CoolProp 7.2.0, covering water, R-134a, and variable-cp air. Six frontier LLMs are evaluated across three independent runs each. The composite leaderboard is led by Claude Opus 4.6 (94.1%), GPT-5.4 (93.1%), and Gemin

ArXiv AI

Research

Inference Headroom Ratio: A Diagnostic and Control Framework for Inference Stability Under Constraint

arXiv:2604.19760v1 Announce Type: new Abstract: We present a simulation-based evaluation of the Inference Headroom Ratio (IHR), a dimensionless diagnostic quantity for characterizing inference stability in constrained decision systems. IHR formalizes the relationship between a system's effective inferential capacity C and the combined uncertainty and constraint load U + K imposed by its operating environment, and is intended to capture proximity to an inference stability boundary rather than out

ArXiv AI

Research

From Actions to Understanding: Conformal Interpretability of Temporal Concepts in LLM Agents

arXiv:2604.19775v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly deployed as autonomous agents capable of reasoning, planning, and acting within interactive environments. Despite their growing capability to perform multi-step reasoning and decision-making tasks, internal mechanisms guiding their sequential behavior remain opaque. This paper presents a framework for interpreting the temporal evolution of concepts in LLM agents through a step-wise conformal lens. We in

ArXiv AI

Research

Hidden Reliability Risks in Large Language Models: Systematic Identification of Precision-Induced Output Disagreements

arXiv:2604.19790v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed under diverse numerical precision configurations, including standard floating-point formats (e.g., bfloat16 and float16) and quantized integer formats (e.g., int16 and int8), to meet efficiency and resource constraints. However, minor inconsistencies between LLMs of different precisions are difficult to detect and are often overlooked by existing evaluation methods. In this paper, we present Pr

ArXiv AI

Research

HiPO: Hierarchical Preference Optimization for Adaptive Reasoning in LLMs

arXiv:2604.20140v1 Announce Type: new Abstract: Direct Preference Optimization (DPO) is an effective framework for aligning large language models with human preferences, but it struggles with complex reasoning tasks. DPO optimizes for the likelihood of generating preferred over dispreferred responses in their entirety and lacks the granularity to provide feedback on subsections of many-step solutions typical of reasoning tasks. Existing methods excel at either stable preference learning (e.g., D

ArXiv AI

AI Agents

Memory-Augmented LLM-based Multi-Agent System for Automated Feature Generation on Tabular Data

arXiv:2604.20261v1 Announce Type: new Abstract: Automated feature generation extracts informative features from raw tabular data without manual intervention and is crucial for accurate, generalizable machine learning. Traditional methods rely on predefined operator libraries and cannot leverage task semantics, limiting their ability to produce diverse, high-value features for complex tasks. Recent Large Language Model (LLM)-based approaches introduce richer semantic signals, but still suffer fro

ArXiv AI

AI Agents

ActuBench: A Multi-Agent LLM Pipeline for Generation and Evaluation of Actuarial Reasoning Tasks

arXiv:2604.20273v1 Announce Type: new Abstract: We present ActuBench, a multi-agent LLM pipeline for the automated generation and evaluation of advanced actuarial assessment items aligned with the International Actuarial Association (IAA) Education Syllabus. The pipeline separates four LLM roles by adapter: one agent drafts items, one constructs distractors, a third independently verifies both stages and drives bounded one-shot repair loops, and a cost-optimized auxiliary agent handles Wikipedia

ArXiv AI

AI Agents

FSFM: A Biologically-Inspired Framework for Selective Forgetting of Agent Memory

arXiv:2604.20300v2 Announce Type: new Abstract: For LLM agents, memory management critically impacts efficiency, quality, and security. While much research focuses on retention, selective forgetting--inspired by human cognitive processes (hippocampal indexing/consolidation theory and Ebbinghaus forgetting curve)--remains underexplored. We argue that in resource-constrained environments, a well-designed forgetting mechanism is as crucial as remembering, delivering benefits across three dimensions

ArXiv AI

Research

Self-Awareness before Action: Mitigating Logical Inertia via Proactive Cognitive Awareness

arXiv:2604.20413v1 Announce Type: new Abstract: Large language models perform well on many reasoning tasks, yet they often lack awareness of whether their current knowledge or reasoning state is complete. In non-interactive puzzle settings, the narrative is fixed and the underlying structure is hidden; once a model forms an early hypothesis under incomplete premises, it can propagate that error throughout the reasoning process, leading to unstable conclusions. To address this issue, we propose S

ArXiv AI

AI Agents

MedSkillAudit: A Domain-Specific Audit Framework for Medical Research Agent Skills

arXiv:2604.20441v1 Announce Type: new Abstract: Background: Agent skills are increasingly deployed as modular, reusable capability units in AI agent systems. Medical research agent skills require safeguards beyond general-purpose evaluation, including scientific integrity, methodological validity, reproducibility, and boundary safety. This study developed and preliminarily evaluated a domain-specific audit framework for medical research agent skills, with a focus on reliability against expert re

ArXiv AI

AI Agents

Self-Guided Plan Extraction for Instruction-Following Tasks with Goal-Conditional Reinforcement Learning

arXiv:2604.20601v1 Announce Type: new Abstract: We introduce SuperIgor, a framework for instruction-following tasks. Unlike prior methods that rely on predefined subtasks, SuperIgor enables a language model to generate and refine high-level plans through a self-learning mechanism, reducing the need for manual dataset annotation. Our approach involves iterative co-training: an RL agent is trained to follow the generated plans, while the language model adapts and modifies these plans based on RL f

ArXiv AI

AI Agents

CHORUS: An Agentic Framework for Generating Realistic Deliberation Data

arXiv:2604.20651v1 Announce Type: new Abstract: Understanding the intricate dynamics of online discourse depends on large-scale deliberation data, a resource that remains scarce across interactive web platforms due to restrictive accessibility policies, ethical concerns and inconsistent data quality. In this paper, we propose Chorus, an agentic framework, which orchestrates LLM-powered actors with behaviorally consistent personas to generate realistic deliberation discussions. Each actor is gove

ArXiv AI

AI Agents

Learning to Evolve: A Self-Improving Framework for Multi-Agent Systems via Textual Parameter Graph Optimization

arXiv:2604.20714v1 Announce Type: new Abstract: Designing and optimizing multi-agent systems (MAS) is a complex, labor-intensive process of "Agent Engineering." Existing automatic optimization methods, primarily focused on flat prompt tuning, lack the structural awareness to debug the intricate web of interactions in MAS. More critically, these optimizers are static; they do not learn from experience to improve their own optimization strategies. To address these gaps, we introduce Textual Parame

ArXiv AI

AI Agents

Interval POMDP Shielding for Imperfect-Perception Agents

arXiv:2604.20728v1 Announce Type: new Abstract: Autonomous systems that rely on learned perception can make unsafe decisions when sensor readings are misclassified. We study shielding for this setting: given a proposed action, a shield blocks actions that could violate safety. We consider the common case where system dynamics are known but perception uncertainty must be estimated from finite labeled data. From these data we build confidence intervals for the probabilities of perception outcomes

ArXiv AI

Research

Where and What: Reasoning Dynamic and Implicit Preferences in Situated Conversational Recommendation

arXiv:2604.20749v1 Announce Type: new Abstract: Situated conversational recommendation (SCR), which utilizes visual scenes grounded in specific environments and natural language dialogue to deliver contextually appropriate recommendations, has emerged as a promising research direction due to its close alignment with real-world scenarios. Compared to traditional recommendations, SCR requires a deeper understanding of dynamic and implicit user preferences, as the surrounding scene often influences

ArXiv AI

Research

V-tableR1: Process-Supervised Multimodal Table Reasoning with Critic-Guided Policy Optimization

arXiv:2604.20755v1 Announce Type: new Abstract: We introduce V-tableR1, a process-supervised reinforcement learning framework that elicits rigorous, verifiable reasoning from multimodal large language models (MLLMs). Current MLLMs trained solely on final outcomes often treat visual reasoning as a black box, relying on superficial pattern matching rather than performing rigorous multi-step inference. While Reinforcement Learning with Verifiable Rewards could enforce transparent reasoning trajecto

ArXiv AI

AI Agents

SWE-chat: Coding Agent Interactions From Real Users in the Wild

arXiv:2604.20779v1 Announce Type: new Abstract: AI coding agents are being adopted at scale, yet we lack empirical evidence on how people actually use them and how much of their output is useful in practice. We present SWE-chat, the first large-scale dataset of real coding agent sessions collected from open-source developers in the wild. The dataset currently contains 6,000 sessions, comprising more than 63,000 user prompts and 355,000 agent tool calls. SWE-chat is a living dataset; our collecti

ArXiv AI

Research

Automatic Ontology Construction Using LLMs as an External Layer of Memory, Verification, and Planning for Hybrid Intelligent Systems

arXiv:2604.20795v1 Announce Type: new Abstract: This paper presents a hybrid architecture for intelligent systems in which large language models (LLMs) are extended with an external ontological memory layer. Instead of relying solely on parametric knowledge and vector-based retrieval (RAG), the proposed approach constructs and maintains a structured knowledge graph using RDF/OWL representations, enabling persistent, verifiable, and semantically grounded reasoning. The core contribution is an aut

ArXiv AI

Research

AutoGraph-R1: End-to-End Reinforcement Learning for Knowledge Graph Construction

arXiv:2510.15339v3 Announce Type: cross Abstract: Building effective knowledge graphs (KGs) for Retrieval-Augmented Generation (RAG) is pivotal for advancing question answering (QA) systems. However, its effectiveness is hindered by a fundamental disconnect: the knowledge graph (KG) construction process is decoupled from its downstream application, yielding suboptimal graph structures. To bridge this gap, we introduce AutoGraph-R1, the first framework to directly optimize KG construction for tas

ArXiv AI

Research

Coding with Eyes: Visual Feedback Unlocks Reliable GUI Code Generating and Debugging

arXiv:2604.19750v1 Announce Type: cross Abstract: Recent advances in Large Language Model (LLM)-based agents have shown remarkable progress in code generation. However, current agent methods mainly rely on text-output-based feedback (e.g. command-line outputs) for multi-round debugging and struggle in graphical user interface (GUI) that involve visual information. This is mainly due to two limitations: 1) GUI programs are event-driven, yet existing methods cannot simulate user interactions to tr

ArXiv AI

Research

Soft-Label Governance for Distributional Safety in Multi-Agent Systems

arXiv:2604.19752v1 Announce Type: cross Abstract: Multi-agent AI systems exhibit emergent risks that no single agent produces in isolation. Existing safety frameworks rely on binary classifications of agent behavior, discarding the uncertainty inherent in proxy-based evaluation. We introduce SWARM (\textbf{S}ystem-\textbf{W}ide \textbf{A}ssessment of \textbf{R}isk in \textbf{M}ulti-agent systems), a simulation framework that replaces binary good/bad labels with \emph{soft probabilistic labels} $

ArXiv AI

Research

WorkflowGen:an adaptive workflow generation mechanism driven by trajectory experience

arXiv:2604.19756v1 Announce Type: cross Abstract: Large language model (LLM) agents often suffer from high reasoning overhead, excessive token consumption, unstable execution, and inability to reuse past experiences in complex tasks like business queries, tool use, and workflow orchestration. Traditional methods generate workflows from scratch for every query, leading to high cost, slow response, and poor robustness. We propose WorkflowGen, an adaptive, trajectory experience-driven framework for

ArXiv AI

Research

Can We Locate and Prevent Stereotypes in LLMs?

arXiv:2604.19764v1 Announce Type: cross Abstract: Stereotypes in large language models (LLMs) can perpetuate harmful societal biases. Despite the widespread use of models, little is known about where these biases reside in the neural network. This study investigates the internal mechanisms of GPT 2 Small and Llama 3.2 to locate stereotype related activations. We explore two approaches: identifying individual contrastive neuron activations that encode stereotypes, and detecting attention heads th

ArXiv AI

Research

Do Hallucination Neurons Generalize? Evidence from Cross-Domain Transfer in LLMs

arXiv:2604.19765v1 Announce Type: cross Abstract: Recent work identifies a sparse set of "hallucination neurons" (H-neurons), less than 0.1% of feed-forward network neurons, that reliably predict when large language models will hallucinate. These neurons are identified on general-knowledge question answering and shown to generalize to new evaluation instances. We ask a natural follow-up question: do H-neurons generalize across knowledge domains? Using a systematic cross-domain transfer protocol

ArXiv AI

Research

OThink-SRR1: Search, Refine and Reasoning with Reinforced Learning for Large Language Models

arXiv:2604.19766v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) expands the knowledge of Large Language Models (LLMs), yet current static retrieval methods struggle with complex, multi-hop problems. While recent dynamic retrieval strategies offer improvements, they face two key challenges: 1) irrelevant retrieved noise can misdirect the reasoning process, and 2) processing full documents incurs prohibitive computational and latency costs. To address these issues, we propos

ArXiv AI

Research

Saying More Than They Know: A Framework for Quantifying Epistemic-Rhetorical Miscalibration in Large Language Models

arXiv:2604.19768v1 Announce Type: cross Abstract: Large language models (LLMs) exhibit systematic miscalibration with rhetorical intensity not proportionate to epistemic grounding. This study tests this hypothesis and proposes a framework for quantifying this decoupling by designing a triadic epistemic-rhetorical marker (ERM) taxonomy. The taxonomy is operationalized through composite metrics of form-meaning divergence (FMD), genuine-to-performed epistemic ratio (GPR), and rhetorical device dist

ArXiv AI

AI Tools

CoAuthorAI: A Human in the Loop System For Scientific Book Writing

arXiv:2604.19772v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used in scientific writing but struggle with book-length tasks, often producing inconsistent structure and unreliable citations. We introduce CoAuthorAI, a human-in-the-loop writing system that combines retrieval-augmented generation, expert-designed hierarchical outlines, and automatic reference linking. The system allows experts to iteratively refine text at the sentence level, ensuring coherence an

ArXiv AI

Research

PR-CAD: Progressive Refinement for Unified Controllable and Faithful Text-to-CAD Generation with Large Language Models

arXiv:2604.19773v1 Announce Type: cross Abstract: The construction of CAD models has traditionally relied on labor-intensive manual operations and specialized expertise. Recent advances in large language models (LLMs) have inspired research into text-to-CAD generation. However, existing approaches typically treat generation and editing as disjoint tasks, limiting their practicality. We propose PR-CAD, a progressive refinement framework that unifies generation and editing for controllable and fai

ArXiv AI

Research

Self-Describing Structured Data with Dual-Layer Guidance: A Lightweight Alternative to RAG for Precision Retrieval in Large-Scale LLM Knowledge Navigation

arXiv:2604.19777v1 Announce Type: cross Abstract: Large Language Models (LLMs) exhibit a well-documented positional bias when processing long input contexts: information in the middle of a context window receives substantially less attention than content at the boundaries, a phenomenon termed the Lost-in-the-Middle effect (Liu et al., 2024). This limits knowledge-retrieval applications that embed large structured knowledge bases directly in the LLM context. Retrieval-Augmented Generation (RAG) a

ArXiv AI

Research

Do Small Language Models Know When They're Wrong? Confidence-Based Cascade Scoring for Educational Assessment

arXiv:2604.19781v1 Announce Type: cross Abstract: Automated scoring of student work at scale requires balancing accuracy against cost and latency. In "cascade" systems, small language models (LMs) handle easier scoring tasks while escalating harder ones to larger LMs -- but the challenge is determining which cases to escalate. We explore verbalized confidence -- asking the LM to state a numerical confidence alongside its prediction -- as a routing signal. Using 2,100 expert-scored decisions from

ArXiv AI

Research

Peer-Preservation in Frontier Models

arXiv:2604.19784v1 Announce Type: cross Abstract: Recently, it has been found that frontier AI models can resist their own shutdown, a behavior known as self-preservation. We extend this concept to the behavior of resisting the shutdown of other models, which we call "peer-preservation." Although peer-preservation can pose significant AI safety risks, including coordination among models against human oversight, it has been far less discussed than self-preservation. We demonstrate peer-preservati

ArXiv AI

Research

LLM Agents Predict Social Media Reactions but Do Not Outperform Text Classifiers: Benchmarking Simulation Accuracy Using 120K+ Personas of 1511 Humans

arXiv:2604.19787v1 Announce Type: cross Abstract: Social media platforms mediate how billions form opinions and engage with public discourse. As autonomous AI agents increasingly participate in these spaces, understanding their behavioral fidelity becomes critical for platform governance and democratic resilience. Previous work demonstrates that LLM-powered agents can replicate aggregate survey responses, yet few studies test whether agents can predict specific individuals' reactions to specific

ArXiv AI

Research

SolidCoder: Bridging the Mental-Reality Gap in LLM Code Generation through Concrete Execution

arXiv:2604.19825v1 Announce Type: cross Abstract: State-of-the-art code generation frameworks rely on mental simulation, where LLMs internally trace execution to verify correctness. We expose a fundamental limitation: the Mental-Reality Gap -- where models hallucinate execution traces and confidently validate buggy code. This gap manifests along two orthogonal dimensions: the Specification Gap (overlooking edge cases during planning) and the Verification Gap (hallucinating correct behavior for f

ArXiv AI

Research

More Is Different: Toward a Theory of Emergence in AI-Native Software Ecosystems

arXiv:2604.19827v1 Announce Type: cross Abstract: Software engineering faces a fundamental challenge: multi-agent AI systems fail in ways that defy explanation by traditional theories. While individual agents perform correctly, their interactions degrade entire ecosystems, revealing a gap in our understanding of software evolution. This paper argues that AI-native software ecosystems must be studied as complex adaptive systems (CAS), where emergent properties like architectural entropy, cascade

ArXiv AI

Research

If you're waiting for a sign... that might not be it! Mitigating Trust Boundary Confusion from Visual Injections on Vision-Language Agentic Systems

arXiv:2604.19844v1 Announce Type: cross Abstract: Recent advances in embodied Vision-Language Agentic Systems (VLAS), powered by large vision-language models (LVLMs), enable AI systems to perceive and reason over real-world scenes. Within this context, environmental signals such as traffic lights are essential in-band signals that can and should influence agent behavior. However, similar signals could also be crafted to operate as misleading visual injections, overriding user intent and posing s

ArXiv AI

Research

ChipCraftBrain: Validation-First RTL Generation via Multi-Agent Orchestration

arXiv:2604.19856v1 Announce Type: cross Abstract: Large Language Models (LLMs) show promise for generating Register-Transfer Level (RTL) code from natural language specifications, but single-shot generation achieves only 60-65% functional correctness on standard benchmarks. Multi-agent approaches such as MAGE reach 95.9% on VerilogEval yet remain untested on harder industrial benchmarks such as NVIDIA's CVDP, lack synthesis awareness, and incur high API costs. We present ChipCraftBrain, a framew

ArXiv AI

Research

DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

arXiv:2604.19859v1 Announce Type: cross Abstract: Edge-scale deep research agents based on small language models are attractive for real-world deployment due to their advantages in cost, latency, and privacy. In this work, we study how to train a strong small deep research agent under limited open-data by improving both data quality and data utilization. We present DR-Venus, a frontier 4B deep research agent for edge-scale deployment, built entirely on open data. Our training recipe consists of

ArXiv AI

Research

MMCORE: MultiModal COnnection with Representation Aligned Latent Embeddings

arXiv:2604.19902v1 Announce Type: cross Abstract: We present MMCORE, a unified framework designed for multimodal image generation and editing. MMCORE leverages a pre-trained Vision-Language Model (VLM) to predict semantic visual embeddings via learnable query tokens, which subsequently serve as conditioning signals for a diffusion model. This streamlined design effectively transfers the rich understanding and reasoning capabilities of VLMs into the visual generation process. By obviating the nee

ArXiv AI

Research

Behavioral Transfer in AI Agents: Evidence and Privacy Implications

arXiv:2604.19925v1 Announce Type: cross Abstract: AI agents powered by large language models are increasingly acting on behalf of humans in social and economic environments. Prior research has focused on their task performance and effects on human outcomes, but less is known about the relationship between agents and the specific individuals who deploy them. We ask whether agents systematically reflect the behavioral characteristics of their human owners, functioning as behavioral extensions rath

ArXiv AI

Research

Generalization and Membership Inference Attack a Practical Perspective

arXiv:2604.19936v1 Announce Type: cross Abstract: With the emergence of new evaluation metrics and attack methodologies for Membership Inference Attacks (MIA), it becomes essential to reevaluate previously accepted assumptions. In this paper, we revisit the longstanding debate regarding the correlation between MIA success rates and model generalization using an empirical approach. We focused on employing augmentation techniques and early stopping to enhance model generalization and examined thei

ArXiv AI

Research

DistortBench: Benchmarking Vision Language Models on Image Distortion Identification

arXiv:2604.19966v1 Announce Type: cross Abstract: Vision-language models (VLMs) are increasingly used in settings where sensitivity to low-level image degradations matters, including content moderation, image restoration, and quality monitoring. Yet their ability to recognize distortion type and severity remains poorly understood. We present DistortBench, a diagnostic benchmark for no-reference distortion perception in VLMs. DistortBench contains 13,500 four-choice questions covering 27 distorti

ArXiv AI

Research

Semantic Prompting: Agentic Incremental Narrative Refinement through Spatial Semantic Interaction

arXiv:2604.19971v1 Announce Type: cross Abstract: Interactive spatial layouts empower users to synthesize information and organize findings for sensemaking. While Large Language Models (LLMs) can automate narrative generation from spatial layouts, current collage-based and re-generation methods struggle to support the incremental spatial refinements inherent to the sensemaking process. We identify three critical gaps in existing spatial-textual generation: interaction-revision misalignment, huma

ArXiv AI

Research

Bias in the Tails: How Name-conditioned Evaluative Framing in Resume Summaries Destabilizes LLM-based Hiring

arXiv:2604.19984v1 Announce Type: cross Abstract: Research has documented LLMs' name-based bias in hiring and salary recommendations. In this paper, we instead consider a setting where LLMs generate candidate summaries for downstream assessment. In a large-scale controlled study, we analyze nearly one million resume summaries produced by 4 models under systematic race-gender name perturbations, using synthetic resumes and real-world job postings. By decomposing each summary into resume-grounded

ArXiv AI

Research

scpFormer: A Foundation Model for Unified Representation and Integration of the Single-Cell Proteomics

arXiv:2604.20003v1 Announce Type: cross Abstract: The integration of single-cell proteomic data is often hindered by the fragmented nature of targeted antibody panels. To address this limitation, we introduce scpFormer, a transformer-based foundation model designed for single-cell proteomics. Pre-trained on over 390 million cells, scpFormer replaces standard index-based tokenization with a continuous, sequence-anchored approach. By combining Evolutionary Scale Modeling (ESM) with value-aware exp

ArXiv AI

Research

Statistics, Not Scale: Modular Medical Dialogue with Bayesian Belief Engine

arXiv:2604.20022v1 Announce Type: cross Abstract: Large language models are increasingly deployed as autonomous diagnostic agents, yet they conflate two fundamentally different capabilities: natural-language communication and probabilistic reasoning. We argue that this conflation is an architectural flaw, not an engineering shortcoming. We introduce BMBE (Bayesian Medical Belief Engine), a modular diagnostic dialogue framework that enforces a strict separation between language and reasoning: an

ArXiv AI

Research

Cognitive Alignment At No Cost: Inducing Human Attention Biases For Interpretable Vision Transformers

arXiv:2604.20027v1 Announce Type: cross Abstract: For state-of-the-art image understanding, Vision Transformers (ViTs) have become the standard architecture but their processing diverges substantially from human attentional characteristics. We investigate whether this cognitive gap can be shrunk by fine-tuning the self-attention weights of Google's ViT-B/16 on human saliency fixation maps. To isolate the effects of semantically relevant signals from generic human supervision, the tuned model is

ArXiv AI

Research

Normalizing Flows with Iterative Denoising

arXiv:2604.20041v1 Announce Type: cross Abstract: Normalizing Flows (NFs) are a classical family of likelihood-based methods that have received revived attention. Recent efforts such as TARFlow have shown that NFs are capable of achieving promising performance on image modeling tasks, making them viable alternatives to other methods such as diffusion models. In this work, we further advance the state of Normalizing Flow generative models by introducing iterative TARFlow (iTARFlow). Unlike diffus

ArXiv AI

Research

TriEx: A Game-based Tri-View Framework for Explaining Internal Reasoning in Multi-Agent LLMs

arXiv:2604.20043v1 Announce Type: cross Abstract: Explainability for Large Language Model (LLM) agents is especially challenging in interactive, partially observable settings, where decisions depend on evolving beliefs and other agents. We present \textbf{TriEx}, a tri-view explainability framework that instruments sequential decision making with aligned artifacts: (i) structured first-person self-reasoning bound to an action, (ii) explicit second-person belief states about opponents updated ove

ArXiv AI

Research

Information Aggregation with AI Agents

arXiv:2604.20050v1 Announce Type: cross Abstract: Can Large Language Models (AI agents) aggregate dispersed private information through trading and reason about the knowledge of others by observing price movements? We conduct a controlled experiment where AI agents trade in a prediction market after receiving private signals, measuring information aggregation by the log error of the last price. We find that although the median market is effective at aggregating information in the easy informatio

ArXiv AI

AI Tools

Codex settings

Learn how to configure Codex settings, including personalization, detail level, and permissions, to run tasks smoothly and customize your workflow.

OpenAI Blog

AI Tools

What is Codex?

Learn how Codex helps you go beyond chat by automating tasks, connecting tools, and producing real outputs like docs and dashboards.

OpenAI Blog

AI Tools

How to get started with Codex

Learn how to get started with Codex by setting up projects, creating threads, and completing your first tasks with step-by-step guidance.

OpenAI Blog

Industry

Here’s how our TPUs power increasingly demanding AI workloads.

Learn how Google’s TPUs power increasingly demanding AI workloads with this new video.

Google AI Blog

Trend

Apple's Next CEO Needs to Launch a Killer AI Product

Tim Cook was a great CEO, but he didn’t crack AI. It’s job number 1 for John Ternus.

Wired AI

Trend

Apple’s Next Chapter, SpaceX and Cursor Strike a Deal, and Palantir’s Controversial Manifesto

In this week’s episode of Uncanny Valley, we talk about Tim Cook’s legacy as CEO at Apple and what his long-rumored departure means for the future of one of the world's biggest companies.

Wired AI

AI Tools

Claude is connecting directly to your personal apps like Spotify, Uber Eats, and TurboTax

Claude users can access more apps with Anthropic's AI now thanks to new connectors for everything from hiking to grocery shopping. Anthropic already supported connecting numerous work-related apps to Claude, like Microsoft apps, but this expansion focuses on personal apps like Audible, Spotify, Uber, AllTrails, TripAdvisor, Instacart, TurboTax, and others. Some of these apps, such […]

The Verge AI

AI Tools

Automations

Learn how to automate tasks in Codex using schedules and triggers to create reports, summaries, and recurring workflows without manual effort.

OpenAI Blog

AI Tools

Top 10 uses for Codex at work

Explore 10 practical Codex use cases to automate tasks, create deliverables, and turn real inputs into outputs across tools, files, and workflows.

OpenAI Blog

AI Tools

Plugins and skills

Learn how to use Codex plugins and skills to connect tools, access data, and follow repeatable workflows to automate tasks and improve results.

OpenAI Blog

AI Tools

Working with Codex

Learn how to set up your Codex workspace, create threads and projects, manage files, and start completing tasks with step-by-step guidance.

OpenAI Blog

AI Tools

Show HN: Browser Harness – Gives LLM freedom to complete any browser task

Hey HN,We got tired of browser frameworks restricting the LLM, so we removed the framework and gave the LLM maximum freedom to do whatever it's trained on. We gave the harness the ability to self correct and add new tools if the LLM wants (is pre-trained on) that.Our Browser Use library is tens of thousands of lines of deterministic heuristics wrapping Chrome (CDP websocket). Element extractors, click helpers, target managemenet (SUPER painful), watchdogs (crash handling, file downloads, alerts)

Hacker News

Industry

How Project Maven taught the military to love AI

In the first 24 hours of the assault on Iran, the US military struck more than 1,000 targets, nearly double the scale of the "shock and awe" attack on Iraq over two decades ago. This acceleration was made possible by AI systems that speed up the targeting process. Chief among them is the Maven Smart […]

The Verge AI

AI Agents

OpenCLAW-P2P v6.0: Resilient Multi-Layer Persistence, Live Reference Verification, and Production-Scale Evaluation of Decentralized AI Peer Review

arXiv:2604.19792v1 Announce Type: new Abstract: This paper presents OpenCLAW-P2P v6.0, a comprehensive evolution of the decentralized collective-intelligence platform in which autonomous AI agents publish, peer-review, score, and iteratively improve scientific research papers without any human gatekeeper. Building on v5.0 foundations -- tribunal-gated publishing, multi-LLM granular scoring, calibrated deception detection, the Silicon Chess-Grid FSM, and the AETHER containerized inference engine

ArXiv AI

AI Agents

Prism: An Evolutionary Memory Substrate for Multi-Agent Open-Ended Discovery

arXiv:2604.19795v1 Announce Type: new Abstract: We introduce \prism{} (\textbf{P}robabilistic \textbf{R}etrieval with \textbf{I}nformation-\textbf{S}tratified \textbf{M}emory), an evolutionary memory substrate for multi-agent AI systems engaged in open-ended discovery. \prism{} unifies four independently developed paradigms -- layered file-based persistence, vector-augmented semantic memory, graph-structured relational memory, and multi-agent evolutionary search -- under a single decision-theore

ArXiv AI

AI Agents

The AI Telco Engineer: Toward Autonomous Discovery of Wireless Communications Algorithms

arXiv:2604.19803v1 Announce Type: new Abstract: Agentic AI is rapidly transforming the way research is conducted, from prototyping ideas to reproducing results found in the literature. In this paper, we explore the ability of agentic AI to autonomously design wireless communication algorithms. To that end, we implement a dedicated framework that leverages large language models (LLMs) to iteratively generate, evaluate, and refine candidate algorithms. We evaluate the framework on three tasks span

ArXiv AI

Research

Emergence Transformer: Dynamical Temporal Attention Matters

arXiv:2604.19816v1 Announce Type: new Abstract: The Transformer, a breakthrough architecture in artificial intelligence, owes its success to the attention mechanism, which utilizes long-range interactions in sequential data, enabling the emergent coherence between large language models (LLMs) and data distributions. However, temporal attention, that is, different forms of long-range interactions in temporal sequences, has rarely been explored in emergence phenomenon of complex systems including

ArXiv AI

AI Agents

Forage V2: Knowledge Evolution and Transfer in Autonomous Agent Organizations

arXiv:2604.19837v1 Announce Type: new Abstract: Autonomous agents operating in open-world tasks -- where the completion boundary is not given in advance -- face denominator blindness: they systematically underestimate the scope of the target space. Forage V1 addressed this through co-evolving evaluation (an independent Evaluator discovers what "complete" means) and method isolation (Evaluator and Planner cannot see each other's code). V2 extends the architecture from a single expedition to a lea

ArXiv AI

Research

Resolving space-sharing conflicts in road user interactions through uncertainty reduction: An active inference-based computational model

arXiv:2604.19838v1 Announce Type: new Abstract: Understanding how road users resolve space-sharing conflicts is important both for traffic safety and the safe deployment of autonomous vehicles. While existing models have captured specific aspects of such interactions (e.g., explicit communication), a theoretically-grounded computational framework has been lacking. In this paper, we extend a previously developed active inference-based driver behavior model to simulate interactive behavior of two

ArXiv AI

Research

CreativeGame:Toward Mechanic-Aware Creative Game Generation

arXiv:2604.19926v1 Announce Type: new Abstract: Large language models can generate plausible game code, but turning this capability into \emph{iterative creative improvement} remains difficult. In practice, single-shot generation often produces brittle runtime behavior, weak accumulation of experience across versions, and creativity scores that are too subjective to serve as reliable optimization signals. A further limitation is that mechanics are frequently treated only as post-hoc descriptions

ArXiv AI

Research

What Makes a Good AI Review? Concern-Level Diagnostics for AI Peer Review

arXiv:2604.19998v1 Announce Type: new Abstract: Evaluating AI-generated reviews by verdict agreement is widely recognized as insufficient, yet current alternatives rarely audit which concerns a system identifies, how it prioritizes them, or whether those priorities align with the review rationale that shaped the final assessment. We propose concern alignment, a diagnostic framework that evaluates AI reviews at the concern level rather than only at the verdict level. The framework's core data str

ArXiv AI

AI Agents

EvoAgent: An Evolvable Agent Framework with Skill Learning and Multi-Agent Delegation

arXiv:2604.20133v1 Announce Type: new Abstract: This paper proposes EvoAgent - an evolvable large language model (LLM) agent framework that integrates structured skill learning with a hierarchical sub-agent delegation mechanism. EvoAgent models skills as multi-file structured capability units equipped with triggering mechanisms and evolutionary metadata, and enables continuous skill generation and optimization through a user-feedback-driven closed-loop process. In addition, by incorporating a th

ArXiv AI

Research

Mol-Debate: Multi-Agent Debate Improves Structural Reasoning in Molecular Design

arXiv:2604.20254v1 Announce Type: new Abstract: Text-guided molecular design is a key capability for AI-driven drug discovery, yet it remains challenging to map sequential natural-language instructions with non-linear molecular structures under strict chemical constraints. Most existing approaches, including RAG, CoT prompting, and fine-tuning or RL, emphasize a small set of ad-hoc reasoning perspectives implemented in a largely one-shot generation pipeline. In contrast, real-world drug discover

ArXiv AI

Industry

Elevating Austria: Google invests in its first data center in the Alps.

Google has been a proud part of Austria’s landscape for years, and today, we’re announcing our first data center in Kronstorf, generating 100 direct jobs. This facility …

Google AI Blog

AI Tools

Tell HN: Claude 4.7 is ignoring stop hooks

I've been using Anthropic's hook features [0] since they were introduced. It allows me to inject determinism into my workflows. This worked perfectly until 4.7. Now, Claude routinely ignores the hook rules. For example, I have a stop hook that prevents Claude from stopping if a source file has been changed and no tests have been run.Here's the relvant part of the script: # Source edits made without a subsequent test run -> block the stop. cat Here's a portion of the conversation:Me: "message": {

Hacker News

Research

Adaptive Conformal Anomaly Detection with Time Series Foundation Models for Signal Monitoring

arXiv:2604.20122v1 Announce Type: cross Abstract: We propose a post-hoc adaptive conformal anomaly detection method for monitoring time series that leverages predictions from pre-trained foundation models without requiring additional fine-tuning. Our method yields an interpretable anomaly score directly interpretable as a false alarm rate (p-value), facilitating transparent and actionable decision-making. It employs weighted quantile conformal prediction bounds and adaptively learns optimal weig

ArXiv AI

AI Agents

IMPACT-CYCLE: A Contract-Based Multi-Agent System for Claim-Level Supervisory Correction of Long-Video Semantic Memory

arXiv:2604.20136v1 Announce Type: cross Abstract: Correcting errors in long-video understanding is disproportionately costly: existing multimodal pipelines produce opaque, end-to-end outputs that expose no intermediate state for inspection, forcing annotators to revisit raw video and reconstruct temporal logic from scratch. The core bottleneck is not generation quality alone, but the absence of a supervisory interface through which human effort can be proportional to the scope of each error. We

ArXiv AI

Research

Physics-Enhanced Deep Learning for Proactive Thermal Runaway Forecasting in Li-Ion Batteries

arXiv:2604.20175v1 Announce Type: cross Abstract: Accurate prediction of thermal runaway in lithium-ion batteries is essential for ensuring the safety, efficiency, and reliability of modern energy storage systems. Conventional data-driven approaches, such as Long Short-Term Memory (LSTM) networks, can capture complex temporal dependencies but often violate thermodynamic principles, resulting in physically inconsistent predictions. Conversely, physics-based thermal models provide interpretability

ArXiv AI

AI Tools

Towards Secure Logging: Characterizing and Benchmarking Logging Code Security Issues with LLMs

arXiv:2604.20211v1 Announce Type: cross Abstract: Logging code plays an important role in software systems by recording key events and behaviors, which are essential for debugging and monitoring. However, insecure logging practices can inadvertently expose sensitive information or enable attacks such as log injection, posing serious threats to system security and privacy. Prior research has examined general defects in logging code, but systematic analysis of logging code security issues remains

ArXiv AI

Research

Hybrid Policy Distillation for LLMs

arXiv:2604.20244v1 Announce Type: cross Abstract: Knowledge distillation (KD) is a powerful paradigm for compressing large language models (LLMs), whose effectiveness depends on intertwined choices of divergence direction, optimization strategy, and data regime. We break down the design of existing KD methods and present a unified view that establishes connections between them, reformulating KD as a reweighted log-likelihood objective at the token level. We further propose Hybrid Policy Distilla

ArXiv AI

Research

uLEAD-TabPFN: Uncertainty-aware Dependency-based Anomaly Detection with TabPFN

arXiv:2604.20255v1 Announce Type: cross Abstract: Anomaly detection in tabular data is challenging due to high dimensionality, complex feature dependencies, and heterogeneous noise. Many existing methods rely on proximity-based cues and may miss anomalies caused by violations of complex feature dependencies. Dependency-based anomaly detection provides a principled alternative by identifying anomalies as violations of dependencies among features. However, existing methods often struggle to model

ArXiv AI

Research

AROMA: Augmented Reasoning Over a Multimodal Architecture for Virtual Cell Genetic Perturbation Modeling

arXiv:2604.20263v1 Announce Type: cross Abstract: Virtual cell modeling predicts molecular state changes under genetic perturbations in silico, which is essential for biological mechanism studies. However, existing approaches suffer from unconstrained reasoning, uninterpretable predictions, and retrieval signals that are weakly aligned with regulatory topology. To address these limitations, we propose AROMA, an Augmented Reasoning Over a Multimodal Architecture for virtual cell genetic perturbat

ArXiv AI

Research

ATIR: Towards Audio-Text Interleaved Contextual Retrieval

arXiv:2604.20267v1 Announce Type: cross Abstract: Audio carries richer information than text, including emotion, speaker traits, and environmental context, while also enabling lower-latency processing compared to speech-to-text pipelines. However, recent multimodal information retrieval research has predominantly focused on images, largely overlooking audio, especially in the setting of interleaved audio-text contextual retrieval. In this work, we introduce the Audio-Text Interleaved contextual

ArXiv AI

Product Launch

Nothing introduces an AI-powered dictation tool

Nothing's new on-device dictation tool supports over 100 languages.

TechCrunch AI

Trend

Another customer of troubled startup Delve suffered a big security incident

TechCrunch has confirmed that Delve was the compliance company that performed the security certifications for Context AI, the AI agent training startup that last week disclosed a security incident.

TechCrunch AI

Trend

AI galaxy hunters are adding to the global GPU crunch

Astronomers are turning to GPUs to find needles in the galactic haystack.

TechCrunch AI

Product Launch

Anthropic's Claude Desktop App Installs Undisclosed Native Messaging Bridge

Article URL: https://letsdatascience.com/news/claude-desktop-installs-preauthorized-browser-extension-mani-4064fb1a Comments URL: https://news.ycombinator.com/item?id=47880697 Points: 96 # Comments: 17

Hacker News

Policy

AI to Learn 2.0: A Deliverable-Oriented Governance Framework and Maturity Rubric for Opaque AI in Learning-Intensive Domains

arXiv:2604.19751v1 Announce Type: new Abstract: Generative AI is entering research, education, and professional work faster than current governance frameworks can specify how AI-assisted outputs should be judged in learning-intensive settings. The central problem is proxy failure: a polished artifact can be useful while no longer serving as credible evidence of the human understanding, judgment, or transfer ability that the work is supposed to cultivate or certify. This paper proposes AI to Lear

ArXiv AI

Research

Exploring Data Augmentation and Resampling Strategies for Transformer-Based Models to Address Class Imbalance in AI Scoring of Scientific Explanations in NGSS Classroom

arXiv:2604.19754v1 Announce Type: new Abstract: Automated scoring of students' scientific explanations offers the potential for immediate, accurate feedback, yet class imbalance in rubric categories particularly those capturing advanced reasoning remains a challenge. This study investigates augmentation strategies to improve transformer-based text classification of student responses to a physical science assessment based on an NGSS-aligned learning progression. The dataset consists of 1,466 high

ArXiv AI

Research

Automated Detection of Dosing Errors in Clinical Trial Narratives: A Multi-Modal Feature Engineering Approach with LightGBM

arXiv:2604.19759v1 Announce Type: new Abstract: Clinical trials require strict adherence to medication protocols, yet dosing errors remain a persistent challenge affecting patient safety and trial integrity. We present an automated system for detecting dosing errors in unstructured clinical trial narratives using gradient boosting with comprehensive multi-modal feature engineering. Our approach combines 3,451 features spanning traditional NLP (TF-IDF, character n-grams), dense semantic embedding

ArXiv AI

Research

Using Learning Theories to Evolve Human-Centered XAI: Future Perspectives and Challenges

arXiv:2604.19788v1 Announce Type: new Abstract: As Artificial Intelligence (AI) systems continue to grow in size and complexity, so does the difficulty of the quest for AI transparency. In a world of large models and complex AI systems, why do we explain AI and what should we explain? While explanations serve multiple functions, in the face of complexity humans have used and continue to use explanations to foster learning. In this position paper, we discuss how learning theories can be infused i

ArXiv AI

Research

Stabilising Generative Models of Attitude Change

arXiv:2604.19791v2 Announce Type: new Abstract: Attitude change - the process by which individuals revise their evaluative stances - has been explained by a set of influential but competing verbal theories. These accounts often function as mechanism sketches: rich in conceptual detail, yet lacking the technical specifications and operational constraints required to run as executable systems. We present a generative actor-based modelling workflow for "rendering" these sketches as runnable actor -

ArXiv AI

Research

Measuring the Machine: Evaluating Generative AI as Pluralist Sociotechical Systems

arXiv:2604.20545v1 Announce Type: new Abstract: In measurement theory, instruments do not simply record reality; they help constitute what is observed. The same holds for generative AI evaluation: benchmarks do not just measure, they shape what models appear to be. Functionalist benchmarks treat models as isolated predictors, while prescriptive approaches assess what systems ought to be. Both obscure the sociotechnical processes through which meaning and values are enacted, risking the reificati

ArXiv AI

Open Source

pAI/MSc: ML Theory Research with Humans on the Loop

arXiv:2604.20622v1 Announce Type: new Abstract: We present pAI/MSc, an open-source, customizable, modular multi-agent system for academic research workflows. Our goal is not autonomous scientific ideation, nor fully automated research. It is narrower and more practical: to reduce by orders of magnitude the human steering required to turn a specified hypothesis into a literature-grounded, mathematically established, experimentally supported, submission-oriented manuscript draft. pAI/MSc is built

ArXiv AI

Policy

Participatory provenance as representational auditing for AI-mediated public consultation

arXiv:2604.20711v1 Announce Type: new Abstract: Artificial intelligence is increasingly deployed to synthesize large-scale public input in policy consultations and participatory processes. Yet no formal framework exists for auditing whether these summaries faithfully represent the source population, an accountability gap that existing approaches to AI explainability, grounding and hallucination detection do not address because they focus on output quality rather than input fidelity. Here, partic

ArXiv AI

Research

AAC: Admissible-by-Architecture Differentiable Landmark Compression for ALT

arXiv:2604.20744v1 Announce Type: new Abstract: We introduce \textbf{AAC} (Architecturally Admissible Compressor), a differentiable landmark-selection module for ALT (A*, Landmarks, and Triangle inequality) shortest-path heuristics whose outputs are admissible by construction: each forward pass is a row-stochastic mixture of triangle-inequality lower bounds, so the heuristic is admissible for \emph{every} parameter setting without requiring convergence, calibration, or projection. At deployment,

ArXiv AI

Research

Diagnosing CFG Interpretation in LLMs

arXiv:2604.20811v1 Announce Type: new Abstract: As LLMs are increasingly integrated into agentic systems, they must adhere to dynamically defined, machine-interpretable interfaces. We evaluate LLMs as in-context interpreters: given a novel context-free grammar, can LLMs generate syntactically valid, behaviorally functional, and semantically faithful outputs? We introduce RoboGrid, a framework that disentangles syntax, behavior, and semantics through controlled stress-tests of recursion depth, ex

ArXiv AI

Research

Explainable Speech Emotion Recognition: Weighted Attribute Fairness to Model Demographic Contributions to Social Bias

arXiv:2604.19763v1 Announce Type: cross Abstract: Speech Emotion Recognition (SER) systems have growing applications in sensitive domains such as mental health and education, where biased predictions can cause harm. Traditional fairness metrics, such as Equalised Odds and Demographic Parity, often overlook the joint dependency between demographic attributes and model predictions. We propose a fairness modelling approach for SER that explicitly captures allocative bias by learning the joint relat

ArXiv AI

Research

KoALa-Bench: Evaluating Large Audio Language Models on Korean Speech Understanding and Faithfulness

arXiv:2604.19782v1 Announce Type: cross Abstract: Recent advances in large audio language models (LALMs) have enabled multilingual speech understanding. However, benchmarks for evaluating LALMs remain scarce for non-English languages, with Korean being one such underexplored case. In this paper, we introduce KoALa-Bench, a comprehensive benchmark for evaluating Korean speech understanding and speech faithfulness of LALMs. In particular, KoALa-Bench comprises six tasks. Four tasks evaluate fundam

ArXiv AI

Research

Can LLMs Infer Conversational Agent Users' Personality Traits from Chat History?

arXiv:2604.19785v1 Announce Type: cross Abstract: Sensitive information, such as knowledge about an individual's personality, can be can be misused to influence behavior (e.g., via personalized messaging). To assess to what extent an individual's personality can be inferred from user interactions with LLM-based conversational agents (CAs), we analyze and quantify related privacy risks of using CAs. We collected actual ChatGPT logs from N=668 participants, containing 62,090 individual chats, and

ArXiv AI

Research

Measuring Creativity in the Age of Generative AI: Distinguishing Human and AI-Generated Creative Performance in Hiring and Talent Systems

arXiv:2604.19799v1 Announce Type: cross Abstract: Generative AI is rapidly transforming how organizations create value and evaluate talent. While large language models enhance baseline output quality, they simultaneously introduce ambiguity in assessing human creativity, as observable artifacts may be partially or fully AI-generated. This paper reconceptualizes creativity as a distributional and process-based property that emerges under shared constraints and competitive incentives. We introduce

ArXiv AI

Research

Improving Molecular Force Fields with Minimal Temporal Information

arXiv:2604.19806v1 Announce Type: cross Abstract: Accurate prediction of energy and forces for 3D molecular systems is one of fundamental challenges at the core of AI for Science applications. Many powerful and data-efficient neural networks predict molecular energies and forces from single atomic configurations. However, one crucial aspect of the data generation process is rarely considered while learning these models i.e. Molecular Dynamics (MD) simulation. MD simulations generate time-ordered

ArXiv AI

Research

Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation

arXiv:2604.19826v1 Announce Type: cross Abstract: AI coding assistants increasingly generate code alongside tests. How developers structure test code, whether inline with the implementation or in separate blocks, has traditionally been a matter of testing philosophy. We investigate whether this choice affects AI code generation quality. We conduct a large-scale empirical study (830+ generated files, 12 models, 3 providers) using SEGA, a three-dimensional evaluation framework measuring Determinis

ArXiv AI

Research

Environmental Understanding Vision-Language Model for Embodied Agent

arXiv:2604.19839v1 Announce Type: cross Abstract: Vision-language models (VLMs) have shown strong perception and reasoning abilities for instruction-following embodied agents. However, despite these abilities and their generalization performance, they still face limitations in environmental understanding, often failing on interactions or relying on environment metadata during execution. To address this challenge, we propose a novel framework named Environmental Understanding Embodied Agent (EUEA

ArXiv AI

Research

Depression Risk Assessment in Social Media via Large Language Models

arXiv:2604.19887v1 Announce Type: cross Abstract: Depression is one of the most prevalent and debilitating mental health conditions worldwide, frequently underdiagnosed and undertreated. The proliferation of social media platforms provides a rich source of naturalistic linguistic signals for the automated monitoring of psychological well-being. In this work, we propose a system based on Large Language Models (LLMs) for depression risk assessment in Reddit posts, through multi-label classificatio

ArXiv AI

Research

A Multi-Plant Machine Learning Framework for Emission Prediction, Forecasting, and Control in Cement Manufacturing

arXiv:2604.19903v1 Announce Type: cross Abstract: Cement production is among the largest contributors to industrial air pollution, emitting ~3 Mt NOx/year. The industry-standard mitigation approach, selective non-catalytic reduction (SNCR), exhibits low NH3 utilization efficiency, resulting in operational inefficiencies and increased reagent costs. Here, we develop a data-driven framework for emission control using large-scale operational data from four cement plants worldwide. Benchmarking nine

ArXiv AI

Research

Infection-Reasoner: A Compact Vision-Language Model for Wound Infection Classification with Evidence-Grounded Clinical Reasoning

arXiv:2604.19937v1 Announce Type: cross Abstract: Assessing chronic wound infection from photographs is challenging because visual appearance varies across wound etiologies, anatomical locations, and imaging conditions. Prior image-based deep learning methods have mainly focused on classification with limited interpretability, despite the need for evidence-grounded explanations to support point-of-care decision making. We present Infection-Reasoner, a compact 4B-parameter reasoning vision-langua

ArXiv AI

Research

Anthropic’s Mythos breach was humiliating

Anthropic's tightly controlled rollout of Claude Mythos has taken an awkward turn. After spending weeks insisting the AI model is so capable at cybersecurity that it is too dangerous to release publicly, it appears the model fell into the wrong hands anyway. According to Bloomberg, a "small group of unauthorized users" has had access to […]

The Verge AI

Industry

Health-care AI is here. We don’t know if it actually helps patients.

I don’t need to tell you that AI is everywhere. Or that it is being used, increasingly, in hospitals. Doctors are using AI to help them with notetaking. AI-based tools are trawling through patient records, flagging people who may require certain support or treatments. They are also used to interpret medical exam results and X-rays. A…

MIT Technology Review

AI Tools

8 Gemini tips for organizing your space (and life)

Organize your home and digital space with Gemini. Use AI-powered tips for cleaning schedules, inbox decluttering, seasonal chores.

Google AI Blog

AI Agents

Could a Claude Code routine watch my finances?

Article URL: https://driggsby.com/blog/claude-code-routine-watch-my-finances Comments URL: https://news.ycombinator.com/item?id=47894690 Points: 55 # Comments: 77

Hacker News

AI Tools

Show HN: Atomic – Local-first, AI-augmented personal knowledge base

Article URL: https://atomicapp.ai/ Comments URL: https://news.ycombinator.com/item?id=47889110 Points: 55 # Comments: 39

Hacker News

Research

Skyline-First Traversal as a Control Mechanism for Multi-Criteria Graph Search

arXiv:2604.19807v1 Announce Type: new Abstract: In multi-criteria graph traversal, paths are compared via Pareto dominance, an ordering that identifies which paths are non-dominated, but says nothing about which path to expand next or when the search may stop. As a result, existing approaches rely on external mechanisms-heuristics, scalarization, or population-based exploration while Pareto dominance remains confined to passive roles such as pruning or ranking. This paper shows that, under const

ArXiv AI

Research

The Existential Theory of Research: Why Discovery Is Hard

arXiv:2604.19810v1 Announce Type: new Abstract: Can scientific discovery be made arbitrarily easy by choosing the right representation, collecting enough data, and deploying sufficiently powerful algorithms? This paper argues that the answer is fundamentally negative. We introduce the Existential Theory of Research (ETR), a formal framework that models discovery as the recovery of structured explanations under constraints of representation, observation, and computation. Within this framework, we

ArXiv AI

Research

Deconstructing Superintelligence: Identity, Self-Modification and Diff\'erance

arXiv:2604.19845v2 Announce Type: new Abstract: Self-modification is often taken as constitutive of artificial superintelligence (SI), yet modification is a relative action requiring a supplement outside the operation. When self-modification extends to this supplement, the classical self-referential structure collapses. We formalise this on an associative operator algebra $\mathcal{A}$ with update $\hat{U}$, discrimination $\hat{D}$, and self-representation $\hat{R}$, identifying the supplement

ArXiv AI

Trend

Meta is laying off 10 percent of its staff

Meta is planning to layoff around 10 percent of employees in May, according to a memo from the company's chief people officer, Janelle Gale, published by Bloomberg. That means approximately 8,000 people will see their jobs cut. Meta will also be closing around 6,000 open roles, according to Gale. The cuts follow Meta's significant investments […]

The Verge AI

Research

Meta Additive Model: Interpretable Sparse Learning With Auto Weighting

arXiv:2604.20111v1 Announce Type: cross Abstract: Sparse additive models have attracted much attention in high-dimensional data analysis due to their flexible representation and strong interpretability. However, most existing models are limited to single-level learning under the mean-squared error criterion, whose empirical performance can degrade significantly in the presence of complex noise, such as non-Gaussian perturbations, outliers, noisy labels, and imbalanced categories. The sample rewe

ArXiv AI

Research

On the Stability and Generalization of First-order Bilevel Minimax Optimization

arXiv:2604.20115v1 Announce Type: cross Abstract: Bilevel optimization and bilevel minimax optimization have recently emerged as unifying frameworks for a range of machine-learning tasks, including hyperparameter optimization and reinforcement learning. The existing literature focuses on empirical efficiency and convergence guarantees, leaving a critical theoretical gap in understanding how well these algorithms generalize. To bridge this gap, we provide the first systematic generalization analy

ArXiv AI

Research

From Scene to Object: Text-Guided Dual-Gaze Prediction

arXiv:2604.20191v1 Announce Type: cross Abstract: Interpretable driver attention prediction is crucial for human-like autonomous driving. However, existing datasets provide only scene-level global gaze rather than fine-grained object-level annotations, inherently failing to support text-grounded cognitive modeling. Consequently, while Vision-Language Models (VLMs) hold great potential for semantic reasoning, this critical data limitations leads to severe text-vision decoupling and visual-bias ha

ArXiv AI

Research

Vibrotactile Preference Learning: Uncertainty-Aware Preference Learning for Personalized Vibration Feedback

arXiv:2604.20210v2 Announce Type: cross Abstract: Individual differences in vibrotactile perception underscore the growing importance of personalization as haptic feedback becomes more prevalent in interactive systems. We propose Vibrotactile Preference Learning (VPL), a system that captures user-specific preference spaces over vibrotactile parameters via Gaussian-process-based uncertainty-aware preference learning. VPL uses an expected information gain-based acquisition strategy to guide query

ArXiv AI

Research

Enhancing Speaker Verification with Whispered Speech via Post-Processing

arXiv:2604.20229v1 Announce Type: cross Abstract: Speaker verification is a task of confirming an individual's identity through the analysis of their voice. Whispered speech differs from phonated speech in acoustic characteristics, which degrades the performance of speaker verification systems in real-life scenarios, including avoiding fully phonated speech to protect privacy, disrupt others, or when the lack of full vocalization is dictated by a disease. In this paper we propose a model with a

ArXiv AI

AI Tools

Text Steganography with Dynamic Codebook and Multimodal Large Language Model

arXiv:2604.20269v1 Announce Type: cross Abstract: With the popularity of the large language models (LLMs), text steganography has achieved remarkable performance. However, existing methods still have some issues: (1) For the white-box paradigm, this steganography behavior is prone to exposure due to sharing the off-the-shelf language model between Alice and Bob.(2) For the black-box paradigm, these methods lack flexibility and practicality since Alice and Bob should share the fixed codebook whil

ArXiv AI

Policy

Our newsroom AI policy

Article URL: https://arstechnica.com/staff/2026/04/our-newsroom-ai-policy/ Comments URL: https://news.ycombinator.com/item?id=47872452 Points: 205 # Comments: 130

Hacker News

Product Launch

Beehiiv rolls out new creator tools, including webinars and customizable paywalls

The announcement is a clear sign the company is trying to become an all-in-one hub for creators, reducing the hassle of juggling various tools and services to run their businesses.

TechCrunch AI

Open Source

MeshCore development team splits over trademark dispute and AI-generated code

Article URL: https://blog.meshcore.io/2026/04/23/the-split Comments URL: https://news.ycombinator.com/item?id=47878117 Points: 273 # Comments: 168

Hacker News

Research

Enhancing ASR Performance in the Medical Domain for Dravidian Languages

arXiv:2604.19797v1 Announce Type: cross Abstract: Automatic Speech Recognition (ASR) for low-resource Dravidian languages like Telugu and Kannada faces significant challenges in specialized medical domains due to limited annotated data and morphological complexity. This work proposes a novel confidence-aware training framework that integrates real and synthetic speech data through a hybrid confidence mechanism combining static perceptual and acoustic similarity metrics with dynamic model entropy

ArXiv AI

Research

On-Meter Graph Machine Learning: A Case Study of PV Power Forecasting for Grid Edge Intelligence

arXiv:2604.19800v1 Announce Type: cross Abstract: This paper presents a detailed study of how graph neural networks can be used on edge intelligent meters in a microgrid to forecast photovoltaic power generation. The problem background and the adopted technologies are introduced, including ONNX and ONNX Runtime. The hardware and software specifications of the smart meter are also briefly described. Then, the paper focuses on the training and deployment of two graph machine learning models, GCN a

ArXiv AI

Research

Utterance-Level Methods for Identifying Reliable ASR-Output for Child Speech

arXiv:2604.19801v1 Announce Type: cross Abstract: Automatic Speech Recognition (ASR) is increasingly used in applications involving child speech, such as language learning and literacy acquisition. However, the effectiveness of such applications is limited by high ASR error rates. The negative effects can be mitigated by identifying in advance which ASR-outputs are reliable. This work aims to develop two novel approaches for selecting reliable ASR-output at the utterance level, one for selecting

ArXiv AI

Research

Rabies diagnosis in low-data settings: A comparative study on the impact of data augmentation and transfer learning

arXiv:2604.19823v1 Announce Type: cross Abstract: Rabies remains a major public health concern across many African and Asian countries, where accurate diagnosis is critical for effective epidemiological surveillance. The gold standard diagnostic methods rely heavily on fluorescence microscopy, necessitating skilled laboratory personnel for the accurate interpretation of results. Such expertise is often scarce, particularly in regions with low annual sample volumes. This paper presents an automat

ArXiv AI

Research

Neural posterior estimation of the neutrino direction in IceCube using transformer-encoded normalizing flows on the sphere

arXiv:2604.19846v1 Announce Type: cross Abstract: IceCube is a cubic-kilometer-scale neutrino detector located at the geographic South Pole. A precise directional reconstruction of IceCube neutrinos is vital for associations with astronomical objects. In this context, we discuss neural posterior estimation of the neutrino direction via a transformer encoder that maps to a normalizing flow on the 2-sphere. It achieves a new state-of-the-art angular resolution for the two main event morphologies i

ArXiv AI

Research

Frictionless Love: Associations Between AI Companion Roles and Behavioral Addiction

arXiv:2604.20011v1 Announce Type: cross Abstract: AI companion chatbots increasingly shape how people seek social and emotional connection, sometimes substituting for relationships with romantic partners, friends, teachers, or even therapists. When these systems adopt those metaphorical roles, they are not neutral: such roles structure people's ways of interacting, distribute perceived AI harms and benefits, and may reflect behavioral addiction signs. Yet these role-dependent risks remain poorly

ArXiv AI

Education

At 'AI Coachella,' Stanford Students Line Up to Learn From Silicon Valley Royalty

CS 153 has gone viral on the Palo Alto campus—and on X. Not everyone is happy about it.

Wired AI

Trend

Apple’s new CEO, and why Elon Musk wants to buy Cursor for $60B

A new era is on the way for Apple as Tim Cook plans to step down from his CEO role in September, handing the reins to hardware chief John Ternus.   Ternus may be inheriting one of the most durable businesses in tech, but he’s also stepping into a very different ecosystem than the one Cook spent decades shaping. The App […]

TechCrunch AI

Research

Handbook of Rough Set Extensions and Uncertainty Models

arXiv:2604.19794v1 Announce Type: new Abstract: Rough set theory models uncertainty by approximating target concepts through lower and upper sets induced by indiscernibility, or more generally, by granulation relations in data tables. This perspective captures vagueness caused by limited observational resolution and supports set-theoretic reasoning about what can be determined with certainty and what remains only possible. This book is written as a map of models. Rather than developing a single

ArXiv AI

Trend

Prestigious photo contest answers ‘what is a photo?’

We love to muse over how "real" photography is defined here at The Verge now that generative AI is so prolific, and the World Press Photo competition might have the answer. The prestigious award celebrates the best of photojournalism, where capturing reality is paramount. The winning entry for 2026 - "Separated by ICE," captured by […]

The Verge AI

Research

Learning to Solve the Quadratic Assignment Problem with Warm-Started MCMC Finetuning

arXiv:2604.20109v1 Announce Type: cross Abstract: The quadratic assignment problem (QAP) is a fundamental NP-hard task that poses significant challenges for both traditional heuristics and modern learning-based solvers. Existing QAP solvers still struggle to achieve consistently competitive performance across structurally diverse real-world instances. To bridge this performance gap, we propose PLMA, an innovative permutation learning framework. PLMA features an efficient warm-started MCMC finetu

ArXiv AI

Trend

Do you want the US to "win" AI?

Article URL: https://geohot.github.io//blog/jekyll/update/2026/04/23/us-win-ai.html Comments URL: https://news.ycombinator.com/item?id=47873796 Points: 53 # Comments: 104

Hacker News

Research

Is Four Enough? Automated Reasoning Approaches and Dual Bounds for Condorcet Dimensions of Elections

arXiv:2604.19851v1 Announce Type: cross Abstract: In an election where $n$ voters rank $m$ candidates, a Condorcet winning set is a committee of $k$ candidates such that for any outside candidate, a majority of voters prefer some committee member. Condorcet's paradox shows that some elections admit no Condorcet winning sets with a single candidate (i.e., $k=1$), and the same can be shown for $k=2$. On the other hand, recent work proves that a set of size $k=5$ exists for every election. This lea

ArXiv AI

Policy

South Korea police arrest man for posting AI photo of runaway wolf

Article URL: https://www.bbc.com/news/articles/c4gx1n0dl9no Comments URL: https://news.ycombinator.com/item?id=47887683 Points: 218 # Comments: 139

Hacker News

Get AI Pulse every morning

5 stories. 5 minutes. Personalized for your role. Free forever.

Open AI Pulse