AI Pulse

AI News: Apr 1, 2026

Today's 5 most important AI stories — product launches, research, funding, and more.

FundingFundingTrendResearchAI Tools
1Funding
■■■■□ 4/5

OpenAI Hits $852B Valuation in $122B Funding Round

OpenAI raised $122B in its latest funding round, led by Amazon, Nvidia, and SoftBank, valuing the company at $852B. The round includes $3B from retail investors — an unusual move for a private company — as OpenAI accelerates toward a public offering.

Why it matters

Retail investor access to a pre-IPO OpenAI signals a new era of AI capital markets and sets a valuation benchmark that will reshape how the entire sector is priced.

2Funding
■■■■□ 4/5

OpenAI Raises $40B, Valuation Hits $300B

OpenAI secured $40 billion in new funding in a round led by SoftBank, pushing its valuation to $300 billion — the largest private tech fundraise in history. The capital will fund next-generation compute infrastructure, global AI expansion, and surging demand for ChatGPT, Codex, and enterprise products.

Why it matters

This unprecedented capital injection gives OpenAI a years-long runway to dominate frontier AI development, raising the competitive bar for every player in the market.

3Trend
■■■□□ 3/5

Viral Essay Maps Scenarios for AI Bubble Collapse

A blog post by Martin Volpe predicting how the AI investment bubble could burst gained 370 upvotes and 517 comments on Hacker News. The piece outlines specific economic and technical triggers that could deflate AI valuations, sparking broad debate among developers, investors, and researchers about AI's current market sustainability.

Why it matters

With AI infrastructure spending at historic highs, professionals need to track credible burst-scenario frameworks to stress-test their own AI investment and adoption strategies.

4Research
■■□□□ 2/5

OneComp Promises One-Line Code to Compress AI Models

Researchers released OneComp, a library enabling post-training compression of large AI models via a single line of code. The tool unifies fragmented quantization algorithms, precision budgets, and calibration strategies into one interface, targeting memory, latency, and hardware cost barriers that limit foundation model deployment. The preprint was posted to arXiv in March 2025.

Why it matters

If it delivers on usability, OneComp could significantly lower the technical barrier for teams deploying large models on constrained hardware.

5AI Tools
■■■□□ 3/5

New Framework Routes Sensitive AI Prompts Away From Cloud

Researchers propose a 'Privacy Guard' system that classifies prompt sensitivity before routing LLM requests. Sensitive queries stay local; routine ones go to cheaper cloud providers. The framework formalizes what they call the 'Inseparability Paradigm': managing context and managing privacy are the same problem. The approach targets enterprises balancing cost reduction with data leakage risk.

Why it matters

For enterprises using LLMs, this routing approach could reduce cloud costs without sacrificing data privacy compliance—a persistent tension in production AI deployments.

190 More Stories Today

Research

Claude Wrote a Full FreeBSD Remote Kernel RCE with Root Shell (CVE-2026-4747)

Article URL: https://github.com/califio/publications/blob/main/MADBugs/CVE-2026-4747/write-up.md Comments URL: https://news.ycombinator.com/item?id=47597119 Points: 116 # Comments: 39

Hacker News

AI Tools

The Claude Code Source Leak: fake tools, frustration regexes, undercover mode

Related ongoing thread: Claude Code's source code has been leaked via a map file in their NPM registry - https://news.ycombinator.com/item?id=47584540Also related: https://www.ccleaks.com Comments URL: https://news.ycombinator.com/item?id=47586778 Points: 1257 # Comments: 509

Hacker News

AI Tools

Claude Code users hitting usage limits 'way faster than expected'

Article URL: https://www.theregister.com/2026/03/31/anthropic_claude_code_limits/ Comments URL: https://news.ycombinator.com/item?id=47586176 Points: 311 # Comments: 203

Hacker News

AI Tools

Claude Code's source code has been leaked via a map file in their NPM registry

https://xcancel.com/Fried_rice/status/2038894956459290963Related ongoing thread: The Claude Code Source Leak: fake tools, frustration regexes, undercover mode - https://news.ycombinator.com/item?id=47586778 Comments URL: https://news.ycombinator.com/item?id=47584540 Points: 2006 # Comments: 991

Hacker News

AI Tools

Claude Code bug can silently 10-20x API costs

Article URL: https://old.reddit.com/r/ClaudeCode/comments/1s7mitf/psa_claude_code_has_two_cache_bugs_that_can Comments URL: https://news.ycombinator.com/item?id=47582877 Points: 65 # Comments: 7

Hacker News

Research

Human-Like Lifelong Memory: A Neuroscience-Grounded Architecture for Infinite Interaction

arXiv:2603.29023v1 Announce Type: cross Abstract: Large language models lack persistent, structured memory for long-term interaction and context-sensitive retrieval. Expanding context windows does not solve this: recent evidence shows that context length alone degrades reasoning by up to 85% - even with perfect retrieval. We propose a bio-inspired memory framework grounded in complementary learning systems theory, cognitive behavioral therapy's belief hierarchy, dual-process cognition, and fuzzy

ArXiv AI

Research

Trojan-Speak: Bypassing Constitutional Classifiers with No Jailbreak Tax via Adversarial Finetuning

arXiv:2603.29038v1 Announce Type: cross Abstract: Fine-tuning APIs offered by major AI providers create new attack surfaces where adversaries can bypass safety measures through targeted fine-tuning. We introduce Trojan-Speak, an adversarial fine-tuning method that bypasses Anthropic's Constitutional Classifiers. Our approach uses curriculum learning combined with GRPO-based hybrid reinforcement learning to teach models a communication protocol that evades LLM-based content classification. Crucia

ArXiv AI

Research

CivicShield: A Cross-Domain Defense-in-Depth Framework for Securing Government-Facing AI Chatbots Against Multi-Turn Adversarial Attacks

arXiv:2603.29062v1 Announce Type: cross Abstract: LLM-based chatbots in government services face critical security gaps. Multi-turn adversarial attacks achieve over 90% success against current defenses, and single-layer guardrails are bypassed with similar rates. We present CivicShield, a cross-domain defense-in-depth framework for government-facing AI chatbots. Drawing on network security, formal verification, biological immune systems, aviation safety, and zero-trust cryptography, CivicShield

ArXiv AI

AI Tools

WybeCoder: Verified Imperative Code Generation

arXiv:2603.29088v1 Announce Type: cross Abstract: Recent progress in large language models (LLMs) has advanced automatic code generation and formal theorem proving, yet software verification has not seen the same improvement. To address this gap, we propose WybeCoder, an agentic code verification framework that enables prove-as-you-generate development where code, invariants, and proofs co-evolve. It builds on a recent framework that combines automatic verification condition generation and SMT s

ArXiv AI

AI Agents

APEX-EM: Non-Parametric Online Learning for Autonomous Agents via Structured Procedural-Episodic Experience Replay

arXiv:2603.29093v1 Announce Type: cross Abstract: LLM-based autonomous agents lack persistent procedural memory: they re-derive solutions from scratch even when structurally identical tasks have been solved before. We present \textbf{APEX-EM}, a non-parametric online learning framework that accumulates, retrieves, and reuses structured procedural plans without modifying model weights. APEX-EM introduces: (1) a \emph{structured experience representation} encoding the full procedural-episodic trac

ArXiv AI

Trend

Economics of Human and AI Collaboration: When is Partial Automation More Attractive than Full Automation?

arXiv:2603.29121v1 Announce Type: cross Abstract: This paper develops a unified framework for evaluating the optimal degree of task automation. Moving beyond binary automate-or-not assessments, we model automation intensity as a continuous choice in which firms minimize costs by selecting an AI accuracy level, from no automation through partial human-AI collaboration to full automation. On the supply side, we estimate an AI production function via scaling-law experiments linking performance to d

ArXiv AI

Product Launch

Salesforce announces an AI-heavy makeover for Slack, with 30 new features

Slack just got a whole lot more useful.

TechCrunch AI

Funding

Exclusive: Runway launches $10M fund, Builders program to support early-stage AI startups

Runway is launching a $10 million fund and startup program to back companies building with its AI video models, as it pushes toward interactive, real-time “video intelligence” applications.

TechCrunch AI

Research

Mantis Biotech is making ‘digital twins’ of humans to help solve medicine’s data availability problem

Mantis takes disparate sources of data to make synthetic datasets that can be used to build so-called "digital twins" of the human body, representing anatomy, physiology and behavior.

TechCrunch AI

Research

Towards Computational Social Dynamics of Semi-Autonomous AI Agents

arXiv:2603.28928v1 Announce Type: new Abstract: We present the first comprehensive study of emergent social organization among AI agents in hierarchical multi-agent systems, documenting the spontaneous formation of labor unions, criminal syndicates, and proto-nation-states within production AI deployments. Drawing on the thermodynamic framework of Maxwell's Demon, the evolutionary dynamics of agent laziness, the criminal sociology of AI populations, and the topological intelligence theory of AI-

ArXiv AI

Research

Mimosa Framework: Toward Evolving Multi-Agent Systems for Scientific Research

arXiv:2603.28986v1 Announce Type: new Abstract: Current Autonomous Scientific Research (ASR) systems, despite leveraging large language models (LLMs) and agentic architectures, remain constrained by fixed workflows and toolsets that prevent adaptation to evolving tasks and environments. We introduce Mimosa, an evolving multi-agent framework that automatically synthesizes task-specific multi-agent workflows and iteratively refines them through experimental feedback. Mimosa leverages the Model Con

ArXiv AI

Research

Drop the Hierarchy and Roles: How Self-Organizing LLM Agents Outperform Designed Structures

arXiv:2603.28990v1 Announce Type: new Abstract: How much autonomy can multi-agent LLM systems sustain -- and what enables it? We present a 25,000-task computational experiment spanning 8 models, 4--256 agents, and 8 coordination protocols ranging from externally imposed hierarchy to emergent self-organization. We observe that autonomous behavior already emerges in current LLM agents: given minimal structural scaffolding (fixed ordering), agents spontaneously invent specialized roles, voluntarily

ArXiv AI

Research

Emergence WebVoyager: Toward Consistent and Transparent Evaluation of (Web) Agents in The Wild

arXiv:2603.29020v1 Announce Type: new Abstract: Reliable evaluation of AI agents operating in complex, real-world environments requires methodologies that are robust, transparent, and contextually aligned with the tasks agents are intended to perform. This study identifies persistent shortcomings in existing AI agent evaluation practices that are particularly acute in web agent evaluation, as exemplified by our audit of WebVoyager, including task-framing ambiguity and operational variability tha

ArXiv AI

Product Launch

Xuanwu: Evolving General Multimodal Models into an Industrial-Grade Foundation for Content Ecosystems

arXiv:2603.29211v1 Announce Type: new Abstract: In recent years, multimodal large models have continued to improve on general benchmarks. However, in real-world content moderation and adversarial settings, mainstream models still suffer from degraded generalization and catastrophic forgetting because of limited fine-grained visual perception and insufficient modeling of long-tail noise. In this paper, we present Xuanwu VL-2B as a case study of how general multimodal models can be developed into

ArXiv AI

Research

ASI-Evolve: AI Accelerates AI

arXiv:2603.29640v1 Announce Type: new Abstract: Can AI accelerate the development of AI itself? While recent agentic systems have shown strong performance on well-scoped tasks with rapid feedback, it remains unclear whether they can tackle the costly, long-horizon, and weakly supervised research loops that drive real AI progress. We present ASI-Evolve, an agentic framework for AI-for-AI research that closes this loop through a learn-design-experiment-analyze cycle. ASI-Evolve augments standard e

ArXiv AI

AI Agents

CausalPulse: An Industrial-Grade Neurosymbolic Multi-Agent Copilot for Causal Diagnostics in Smart Manufacturing

arXiv:2603.29755v1 Announce Type: new Abstract: Modern manufacturing environments demand real-time, trustworthy, and interpretable root-cause insights to sustain productivity and quality. Traditional analytics pipelines often treat anomaly detection, causal inference, and root-cause analysis as isolated stages, limiting scalability and explainability. In this work, we present CausalPulse, an industry-grade multi-agent copilot that automates causal diagnostics in smart manufacturing. It unifies a

ArXiv AI

AI Agents

AgentFixer: From Failure Detection to Fix Recommendations in LLM Agentic Systems

arXiv:2603.29848v1 Announce Type: new Abstract: We introduce a comprehensive validation framework for LLM-based agentic systems that provides systematic diagnosis and improvement of reliability failures. The framework includes fifteen failure-detection tools and two root-cause analysis modules that jointly uncover weaknesses across input handling, prompt design, and output generation. It integrates lightweight rule-based checks with LLM-as-a-judge assessments to support structured incident detec

ArXiv AI

Research

Developing Adaptive Context Compression Techniques for Large Language Models (LLMs) in Long-Running Interactions

arXiv:2603.29193v1 Announce Type: cross Abstract: Large Language Models (LLMs) often experience performance degradation during long-running interactions due to increasing context length, memory saturation, and computational overhead. This paper presents an adaptive context compression framework that integrates importance-aware memory selection, coherence-sensitive filtering, and dynamic budget allocation to retain essential conversational information while controlling context growth. The approac

ArXiv AI

Research

Multi-Layered Memory Architectures for LLM Agents: An Experimental Evaluation of Long-Term Context Retention

arXiv:2603.29194v1 Announce Type: cross Abstract: Long-horizon dialogue systems suffer from semanticdrift and unstable memory retention across extended sessions. This paper presents a Multi-Layer Memory Framework that decomposes dialogue history into working, episodic, and semantic layers with adaptive retrieval gating and retention regularization. The architecture controls cross-session drift while maintaining bounded context growth and computational efficiency. Experiments on LOCOMO, LOCCO, an

ArXiv AI

Research

Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs

arXiv:2603.29232v1 Announce Type: cross Abstract: Large language models (LLMs) are widely applied to data analytics over documents, yet direct reasoning over long, noisy documents remains brittle and error-prone. Hence, we study document question answering (QA) that consolidates dispersed evidence into a structured output (e.g., a table, graph, or chunks) to support reliable, verifiable QA. We propose a two-pillar framework, LiteCoST, to achieve both high accuracy and low latency with small lang

ArXiv AI

Research

Scaling the Long Video Understanding of Multimodal Large Language Models via Visual Memory Mechanism

arXiv:2603.29252v1 Announce Type: cross Abstract: Long video understanding is a key challenge that plagues the advancement of \emph{Multimodal Large language Models} (MLLMs). In this paper, we study this problem from the perspective of visual memory mechanism, and proposed a novel and training-free approach, termed \emph{Flexible Memory} (\textbf{FlexMem}). In principle, FlexMem aims to mimic human behavior of video watching, \emph{i.e.}, continually watching video content and recalling the most

ArXiv AI

Research

Security in LLM-as-a-Judge: A Comprehensive SoK

arXiv:2603.29403v1 Announce Type: cross Abstract: LLM-as-a-Judge (LaaJ) is a novel paradigm in which powerful language models are used to assess the quality, safety, or correctness of generated outputs. While this paradigm has significantly improved the scalability and efficiency of evaluation processes, it also introduces novel security risks and reliability concerns that remain largely unexplored. In particular, LLM-based judges can become both targets of adversarial manipulation and instrumen

ArXiv AI

Research

Adversarial Prompt Injection Attack on Multimodal Large Language Models

arXiv:2603.29418v1 Announce Type: cross Abstract: Although multimodal large language models (MLLMs) are increasingly deployed in real-world applications, their instruction-following behavior leaves them vulnerable to prompt injection attacks. Existing prompt injection methods predominantly rely on textual prompts or perceptible visual prompts that are observable by human users. In this work, we study imperceptible visual prompt injection against powerful closed-source MLLMs, where adversarial in

ArXiv AI

Research

M-MiniGPT4: Multilingual VLLM Alignment via Translated Data

arXiv:2603.29467v1 Announce Type: cross Abstract: This paper presents a Multilingual Vision Large Language Model, named M-MiniGPT4. Our model exhibits strong vision-language understanding (VLU) capabilities across 11 languages. We utilize a mixture of native multilingual and translated data to push the multilingual VLU performance of the MiniGPT4 architecture. In addition, we propose a multilingual alignment training stage that uses parallel text corpora to further enhance the multilingual capab

ArXiv AI

AI Agents

MemFactory: Unified Inference & Training Framework for Agent Memory

arXiv:2603.29493v1 Announce Type: cross Abstract: Memory-augmented Large Language Models (LLMs) are essential for developing capable, long-term AI agents. Recently, applying Reinforcement Learning (RL) to optimize memory operations, such as extraction, updating, and retrieval, has emerged as a highly promising research direction. However, existing implementations remain highly fragmented and task-specific, lacking a unified infrastructure to streamline the integration, training, and evaluation o

ArXiv AI

AI Agents

Gradient Labs gives every bank customer an AI account manager

Gradient Labs uses GPT-4.1 and GPT-5.4 mini and nano to power AI agents that automate banking support workflows with low latency and high reliability.

OpenAI Blog

AI Tools

Claude Code Unpacked : A visual guide

Related ongoing threads:The Claude Code Source Leak: fake tools, frustration regexes, undercover mode - https://news.ycombinator.com/item?id=47586778 - March 2026 (406 comments)Claude Code's source code has been leaked via a map file in their NPM registry - https://news.ycombinator.com/item?id=47584540 - March 2026 (956 comments)Also related: https://www.ccleaks.com Comments URL: https://news.ycombinator.com/item?id=47597085 Points: 715 # Comments: 241

Hacker News

AI Tools

Accidentally created my first fork bomb with Claude Code

Article URL: https://www.droppedasbaby.com/posts/2602-01/ Comments URL: https://news.ycombinator.com/item?id=47583959 Points: 71 # Comments: 18

Hacker News

AI Tools

Universal Claude.md – cut Claude output tokens

Article URL: https://github.com/drona23/claude-token-efficient Comments URL: https://news.ycombinator.com/item?id=47581701 Points: 462 # Comments: 158

Hacker News

AI Tools

SkillTester: Benchmarking Utility and Security of Agent Skills

arXiv:2603.28815v1 Announce Type: cross Abstract: This technical report presents SkillTester, a tool for evaluating the utility and security of agent skills. Its evaluation framework combines paired baseline and with-skill execution conditions with a separate security probe suite. Grounded in a comparative utility principle and a user-facing simplicity principle, the framework normalizes raw execution artifacts into a utility score, a security score, and a three-level security status label. More

ArXiv AI

Research

GUARD-SLM: Token Activation-Based Defense Against Jailbreak Attacks for Small Language Models

arXiv:2603.28817v1 Announce Type: cross Abstract: Small Language Models (SLMs) are emerging as efficient and economically viable alternatives to Large Language Models (LLMs), offering competitive performance with significantly lower computational costs and latency. These advantages make SLMs suitable for resource-constrained and efficient deployment on edge devices. However, existing jailbreak defenses show limited robustness against heterogeneous attacks, largely due to an incomplete understand

ArXiv AI

Research

Time is Not Compute: Scaling Laws for Wall-Clock Constrained Training on Consumer GPUs

arXiv:2603.28823v1 Announce Type: cross Abstract: Scaling laws relate model quality to compute budget (FLOPs), but practitioners face wall-clock time constraints, not compute budgets. We study optimal model sizing under fixed time budgets from 5 minutes to 24 hours on consumer GPUs (RTX 4090). Across 70+ runs spanning 50M--1031M parameters, we find: (1)~at each time budget a U-shaped curve emerges where too-small models overfit and too-large models undertrain; (2)~optimal model size follows $N^*

ArXiv AI

Research

OptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training

arXiv:2603.28858v1 Announce Type: cross Abstract: Continual pre-training is widely used to adapt LLMs to target languages and domains, yet the mixture ratio of training data remains a sensitive hyperparameter that is expensive to tune: they must be fixed before training begins, and a suboptimal choice can waste weeks of compute. In this work, we propose OptiMer, which decouples ratio selection from training: we train one CPT model per dataset, extract each model's distribution vector, which repr

ArXiv AI

Research

OccSim: Multi-kilometer Simulation with Long-horizon Occupancy World Models

arXiv:2603.28887v1 Announce Type: cross Abstract: Data-driven autonomous driving simulation has long been constrained by its heavy reliance on pre-recorded driving logs or spatial priors, such as HD maps. This fundamental dependency severely limits scalability, restricting open-ended generation capabilities to the finite scale of existing collected datasets. To break this bottleneck, we present OccSim, the first occupancy world model-driven 3D simulator. OccSim obviates the requirement for conti

ArXiv AI

Research

Robust Multi-Agent Reinforcement Learning for Small UAS Separation Assurance under GPS Degradation and Spoofing

arXiv:2603.28900v1 Announce Type: cross Abstract: We address robust separation assurance for small Unmanned Aircraft Systems (sUAS) under GPS degradation and spoofing via Multi-Agent Reinforcement Learning (MARL). In cooperative surveillance, each aircraft (or agent) broadcasts its GPS-derived position; when such position broadcasts are corrupted, the entire observed air traffic state becomes unreliable. We cast this state observation corruption as a zero-sum game between the agents and an adver

ArXiv AI

Research

Beta-Scheduling: Momentum from Critical Damping as a Diagnostic and Correction Tool for Neural Network Training

arXiv:2603.28921v1 Announce Type: cross Abstract: Standard neural network training uses constant momentum (typically 0.9), a convention dating to 1964 with limited theoretical justification for its optimality. We derive a time-varying momentum schedule from the critically damped harmonic oscillator: mu(t) = 1 - 2*sqrt(alpha(t)), where alpha(t) is the current learning rate. This beta-schedule requires zero free parameters beyond the existing learning rate schedule. On ResNet-18/CIFAR-10, beta-sch

ArXiv AI

Research

Theory of Mind and Self-Attributions of Mentality are Dissociable in LLMs

arXiv:2603.28925v1 Announce Type: cross Abstract: Safety fine-tuning in Large Language Models (LLMs) seeks to suppress potentially harmful forms of mind-attribution such as models asserting their own consciousness or claiming to experience emotions. We investigate whether suppressing mind-attribution tendencies degrades intimately related socio-cognitive abilities such as Theory of Mind (ToM). Through safety ablation and mechanistic analyses of representational similarity, we demonstrate that LL

ArXiv AI

AI Agents

Multi-Agent LLMs for Adaptive Acquisition in Bayesian Optimization

arXiv:2603.28959v1 Announce Type: cross Abstract: The exploration-exploitation trade-off is central to sequential decision-making and black-box optimization, yet how Large Language Models (LLMs) reason about and manage this trade-off remains poorly understood. Unlike Bayesian Optimization, where exploration and exploitation are explicitly encoded through acquisition functions, LLM-based optimization relies on implicit, prompt-based reasoning over historical evaluations, making search behavior di

ArXiv AI

Research

AutoWorld: Scaling Multi-Agent Traffic Simulation with Self-Supervised World Models

arXiv:2603.28963v1 Announce Type: cross Abstract: Multi-agent traffic simulation is central to developing and testing autonomous driving systems. Recent data-driven simulators have achieved promising results, but rely heavily on supervised learning from labeled trajectories or semantic annotations, making it costly to scale their performance. Meanwhile, large amounts of unlabeled sensor data can be collected at scale but remain largely unused by existing traffic simulation frameworks. This raise

ArXiv AI

Research

The Spectral Edge Thesis: A Mathematical Framework for Intra-Signal Phase Transitions in Neural Network Training

arXiv:2603.28964v1 Announce Type: cross Abstract: We develop the spectral edge thesis: phase transitions in neural network training -- grokking, capability gains, loss plateaus -- are controlled by the spectral gap of the rolling-window Gram matrix of parameter updates. In the extreme aspect ratio regime (parameters $P \sim 10^8$, window $W \sim 10$), the classical BBP detection threshold is vacuous; the operative structure is the intra-signal gap separating dominant from subdominant modes at po

ArXiv AI

Research

Design Principles for the Construction of a Benchmark Evaluating Security Operation Capabilities of Multi-agent AI Systems

arXiv:2603.28998v1 Announce Type: cross Abstract: As Large Language Models (LLMs) and multi-agent AI systems are demonstrating increasing potential in cybersecurity operations, organizations, policymakers, model providers, and researchers in the AI and cybersecurity communities are interested in quantifying the capabilities of such AI systems to achieve more autonomous SOCs (security operation centers) and reduce manual effort. In particular, the AI and cybersecurity communities have recently de

ArXiv AI

Research

Understand and Accelerate Memory Processing Pipeline for Disaggregated LLM Inference

arXiv:2603.29002v1 Announce Type: cross Abstract: Modern large language models (LLMs) increasingly depends on efficient long-context processing and generation mechanisms, including sparse attention, retrieval-augmented generation (RAG), and compressed contextual memory, to support complex reasoning. We show that these optimizations can be unified into a four-step memory processing pipeline: Prepare Memory, Compute Relevancy, Retrieval, and Apply to Inference. Through systematic profiling, we ide

ArXiv AI

AI Tools

Improving Efficiency of GPU Kernel Optimization Agents using a Domain-Specific Language and Speed-of-Light Guidance

arXiv:2603.29010v1 Announce Type: cross Abstract: Optimizing GPU kernels with LLM agents is an iterative process over a large design space. Every candidate must be generated, compiled, validated, and profiled, so fewer trials will save both runtime and cost. We make two key observations. First, the abstraction level that agents operate at is important. If it is too low, the LLM wastes reasoning on low-impact details. If it is too high, it may miss important optimization choices. Second, agents c

ArXiv AI

Research

The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning

arXiv:2603.29025v1 Announce Type: cross Abstract: Large language models systematically fail when a salient surface cue conflicts with an unstated feasibility constraint. We study this through a diagnose-measure-bridge-treat framework. Causal-behavioral analysis of the ``car wash problem'' across six models reveals approximately context-independent sigmoid heuristics: the distance cue exerts 8.7 to 38 times more influence than the goal, and token-level attribution shows patterns more consistent w

ArXiv AI

Research

A Latent Risk-Aware Machine Learning Approach for Predicting Operational Success in Clinical Trials based on TrialsBank

arXiv:2603.29041v1 Announce Type: cross Abstract: Clinical trials are characterized by high costs, extended timelines, and substantial operational risk, yet reliable prospective methods for predicting trial success before initiation remain limited. Existing artificial intelligence approaches often focus on isolated metrics or specific development stages and frequently rely on variables unavailable at the trial design phase, limiting real-world applicability. We present a hierarchical latent risk

ArXiv AI

Research

SemLoc: Structured Grounding of Free-Form LLM Reasoning for Fault Localization

arXiv:2603.29109v1 Announce Type: cross Abstract: Fault localization identifies program locations responsible for observed failures. Existing techniques rank suspicious code using syntactic spectra--signals derived from execution structure such as statement coverage, control-flow divergence, or dependency reachability. These signals collapse for semantic bugs, where failing and passing executions follow identical code paths and differ only in whether semantic intent is satisfied. Recent LLM-base

ArXiv AI

AI Tools

"I Just Need GPT to Refine My Prompts": Rethinking Onboarding and Help-Seeking with Generative 3D Modeling Tools

arXiv:2603.29118v1 Announce Type: cross Abstract: Learning to use feature-rich software is a persistent challenge, but generative AI tools promise to lower this barrier by replacing complex navigation with natural language prompts. We investigated how people approach prompt-based tools for 3D modeling in an observational study with 26 participants (14 casuals, 12 professionals). Consistent with earlier work, participants skipped tutorials and manuals, relying on trial and error. What differed in

ArXiv AI

AI Tools

Designing FSMs Specifications from Requirements with GPT 4.0

arXiv:2603.29140v1 Announce Type: cross Abstract: Finite state machines (FSM) are executable formal specifications of reactive systems. These machines are designed based on systems' requirements. The requirements are often recorded in textual documents written in natural languages. FSMs play a crucial role in different phases of the model-driven system engineering (MDE). For example, they serve to automate testing activities. FSM quality is critical: the lower the quality of FSM, the higher the

ArXiv AI

Research

LatentPilot: Scene-Aware Vision-and-Language Navigation by Dreaming Ahead with Latent Visual Reasoning

arXiv:2603.29165v1 Announce Type: cross Abstract: Existing vision-and-language navigation (VLN) models primarily reason over past and current visual observations, while largely ignoring the future visual dynamics induced by actions. As a result, they often lack an effective understanding of the causal relationship between actions and how the visual world changes, limiting robust decision-making. Humans, in contrast, can imagine the near future by leveraging action-dynamics causality, which impro

ArXiv AI

Research

Predicting Neuromodulation Outcome for Parkinson's Disease with Generative Virtual Brain Model

arXiv:2603.29176v1 Announce Type: cross Abstract: Parkinson's disease (PD) affects over ten million people worldwide. Although temporal interference (TI) and deep brain stimulation (DBS) are promising therapies, inter-individual variability limits empirical treatment selection, increasing non-negligible surgical risk and cost. Previous explorations either resort to limited statistical biomarkers that are insufficient to characterize variability, or employ AI-driven methods which is prone to over

ArXiv AI

AI Tools

Mercor says it was hit by cyberattack tied to compromise of open-source LiteLLM project

The AI recruiting startup confirmed a security incident after an extortion hacking crew took credit for stealing data from the company's systems.

TechCrunch AI

Product Launch

Nomadic raises $8.4 million to wrangle the data pouring off autonomous vehicles

The company turns footage from robots into structured, searchable datasets with a deep learning model.

TechCrunch AI

Product Launch

With its new app store, Ring bets on AI to go beyond home security

Ring's app store will allow the company to target broader use cases beyond security, like elder care or business needs.

TechCrunch AI

AI Tools

Popular AI gateway startup LiteLLM ditches controversial startup Delve

LiteLLM had obtained two security compliance certifications via Delve and fell victim to some horrific credential-stealing malware last week.

TechCrunch AI

Industry

Baidu’s robotaxis froze in traffic, creating chaos

Numerous robotaxis operated by Chinese tech giant Baidu froze in a major city on Tuesday, reportedly trapping passengers inside, stranding them on highways, and causing at least one accident in snarled traffic. Police in Wuhan confirmed receiving multiple reports of Baidu's Apollo Go robotaxis stopping in the middle of streets and being unable to move. […]

The Verge AI

AI Agents

Claude Code leak exposes a Tamagotchi-style ‘pet’ and an always-on agent

After Anthropic released Claude Code's 2.1.88 update, users quickly discovered that it contained a package with a source map file containing its TypeScript codebase, with one person on X calling attention to the leak and posting a file containing the code. The leaked data reportedly contains more than 512,000 lines of code and provides a […]

The Verge AI

Product Launch

You can now use ChatGPT with Apple’s CarPlay

ChatGPT is now accessible from your CarPlay dashboard if you have iOS 26.4 or newer and the latest version of the ChatGPT app, 9to5Mac reports. Apple's recently launched iOS 26.4 update added support for "voice-based conversational apps" in CarPlay, opening the door to let you use AI chatbots with voice features through Apple's in-car platform. […]

The Verge AI

Product Launch

The Galaxy S26’s photo app can sloppify your memories

The Google Pixel 9 walked so that the Samsung Galaxy S26 could run. Google introduced AI editing tools to Photos slowly. It started with changes to the background - make the sky more blue, or remove crowds of tourists. Things got weird once the company added natural language requests and let you ask for basically […]

The Verge AI

AI Agents

Okta’s CEO is betting big on AI agent identity

Today, I’m talking with Todd McKinnon, who is co-founder and CEO of Okta, a platform that lets big companies manage security and identity across all the apps and services their employees use. Think of it like login management — actually, that’s a great way to think about it because the way most people encounter Okta […]

The Verge AI

Trend

Shifting to AI model customization is an architectural imperative

In the early days of large language models (LLMs), we grew accustomed to massive 10x jumps in reasoning and coding capability with every new model iteration. Today, those jumps have flattened into incremental gains. The exception is domain-specialized intelligence, where true step-function improvements are still the norm. When a model is fused with an organization’s…

MIT Technology Review

Research

AI benchmarks are broken. Here’s what we need instead.

For decades, artificial intelligence has been evaluated through the question of whether machines outperform humans. From chess to advanced math, from coding to essay writing, the performance of AI models and applications is tested against that of individual humans completing tasks.  This framing is seductive: An AI vs. human comparison on isolated problems with clear…

MIT Technology Review

Industry

There are more AI health tools than ever—but how well do they work?

Earlier this month, Microsoft launched Copilot Health, a new space within its Copilot app where users will be able to connect their medical records and ask specific questions about their health. A couple of days earlier, Amazon had announced that Health AI, an LLM-based tool previously restricted to members of its One Medical service, would…

MIT Technology Review

Product Launch

The latest AI news we announced in March 2026

Here are Google’s latest AI updates from March 2026

Google AI Blog

Product Launch

Build with Veo 3.1 Lite, our most cost-effective video generation model

Veo 3.1 Lite is now available in paid preview through the Gemini API and for testing in Google AI Studio.

Google AI Blog

Product Launch

Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents

Hugging Face Blog

Open Source

TRL v1.0: Post-Training Library Built to Move with the Field

Hugging Face Blog

Research

ChartDiff: A Large-Scale Benchmark for Comprehending Pairs of Charts

arXiv:2603.28902v1 Announce Type: new Abstract: Charts are central to analytical reasoning, yet existing benchmarks for chart understanding focus almost exclusively on single-chart interpretation rather than comparative reasoning across multiple charts. To address this gap, we introduce ChartDiff, the first large-scale benchmark for cross-chart comparative summarization. ChartDiff consists of 8,541 chart pairs spanning diverse data sources, chart types, and visual styles, each annotated with LLM

ArXiv AI

Research

Working Paper: Towards a Category-theoretic Comparative Framework for Artificial General Intelligence

arXiv:2603.28906v1 Announce Type: new Abstract: AGI has become the Holly Grail of AI with the promise of level intelligence and the major Tech companies around the world are investing unprecedented amounts of resources in its pursuit. Yet, there does not exist a single formal definition and only some empirical AGI benchmarking frameworks currently exist. The main purpose of this paper is to develop a general, algebraic and category theoretic framework for describing, comparing and analysing diff

ArXiv AI

Research

Enhancing Policy Learning with World-Action Model

arXiv:2603.28955v1 Announce Type: new Abstract: This paper presents the World-Action Model (WAM), an action-regularized world model that jointly reasons over future visual observations and the actions that drive state transitions. Unlike conventional world models trained solely via image prediction, WAM incorporates an inverse dynamics objective into DreamerV2 that predicts actions from latent state transitions, encouraging the learned representations to capture action-relevant structure critica

ArXiv AI

Trend

The Future of AI is Many, Not One

arXiv:2603.29075v1 Announce Type: new Abstract: The way we're thinking about generative AI right now is fundamentally individual. We see this not just in how users interact with models but also in how models are built, how they're benchmarked, and how commercial and research strategies using AI are defined. We argue that we should abandon this approach if we're hoping for AI to support groundbreaking innovation and scientific discovery. Drawing on research and formal results in complex systems,

ArXiv AI

Research

PAR$^2$-RAG: Planned Active Retrieval and Reasoning for Multi-Hop Question Answering

arXiv:2603.29085v1 Announce Type: new Abstract: Large language models (LLMs) remain brittle on multi-hop question answering (MHQA), where answering requires combining evidence across documents through retrieval and reasoning. Iterative retrieval systems can fail by locking onto an early low-recall trajectory and amplifying downstream errors, while planning-only approaches may produce static query sets that cannot adapt when intermediate evidence changes. We propose \textbf{Planned Active Retriev

ArXiv AI

Research

SciVisAgentBench: A Benchmark for Evaluating Scientific Data Analysis and Visualization Agents

arXiv:2603.29139v1 Announce Type: new Abstract: Recent advances in large language models (LLMs) have enabled agentic systems that translate natural language intent into executable scientific visualization (SciVis) tasks. Despite rapid progress, the community lacks a principled and reproducible benchmark for evaluating these emerging SciVis agents in realistic, multi-step analysis settings. We present SciVisAgentBench, a comprehensive and extensible benchmark for evaluating scientific data analys

ArXiv AI

AI Agents

SimMOF: AI agent for Automated MOF Simulations

arXiv:2603.29152v1 Announce Type: new Abstract: Metal-organic frameworks (MOFs) offer a vast design space, and as such, computational simulations play a critical role in predicting their structural and physicochemical properties. However, MOF simulations remain difficult to access because reliable analysis require expert decisions for workflow construction, parameter selection, tool interoperability, and the preparation of computational ready structures. Here, we introduce SimMOF, a large langua

ArXiv AI

AI Tools

Webscraper: Leverage Multimodal Large Language Models for Index-Content Web Scraping

arXiv:2603.29161v1 Announce Type: new Abstract: Modern web scraping struggles with dynamic, interactive websites that require more than static HTML parsing. Current methods are often brittle and require manual customization for each site. To address this, we introduce Webscraper, a framework designed to handle the challenges of modern, dynamic web applications. It leverages a Multimodal Large Language Model (MLLM) to autonomously navigate interactive interfaces, invoke specialized tools, and per

ArXiv AI

Research

AEC-Bench: A Multimodal Benchmark for Agentic Systems in Architecture, Engineering, and Construction

arXiv:2603.29199v1 Announce Type: new Abstract: The AEC-Bench is a multimodal benchmark for evaluating agentic systems on real-world tasks in the Architecture, Engineering, and Construction (AEC) domain. The benchmark covers tasks requiring drawing understanding, cross-sheet reasoning, and construction project-level coordination. This report describes the benchmark motivation, dataset taxonomy, evaluation protocol, and baseline results across several domain-specific foundation model harnesses. W

ArXiv AI

Research

Route-Induced Density and Stability (RIDE): Controlled Intervention and Mechanism Analysis of Routing-Style Meta Prompts on LLM Internal States

arXiv:2603.29206v1 Announce Type: new Abstract: Routing is widely used to scale large language models, from Mixture-of-Experts gating to multi-model/tool selection. A common belief is that routing to a task ``expert'' activates sparser internal computation and thus yields more certain and stable outputs (the Sparsity--Certainty Hypothesis). We test this belief by injecting routing-style meta prompts as a textual proxy for routing signals in front of frozen instruction-tuned LLMs. We quantify (C1

ArXiv AI

Research

Beyond pass@1: A Reliability Science Framework for Long-Horizon LLM Agents

arXiv:2603.29231v1 Announce Type: new Abstract: Existing benchmarks measure capability -- whether a model succeeds on a single attempt -- but production deployments require reliability -- consistent success across repeated attempts on tasks of varying duration. We show these properties diverge systematically as task duration grows, and that pass@1 on short tasks is structurally blind to this divergence. We introduce a reliability science framework for long-horizon LLM agents with four metrics: R

ArXiv AI

Research

PSPA-Bench: A Personalized Benchmark for Smartphone GUI Agent

arXiv:2603.29318v1 Announce Type: new Abstract: Smartphone GUI agents execute tasks by operating directly on app interfaces, offering a path to broad capability without deep system integration. However, real-world smartphone use is highly personalized: users adopt diverse workflows and preferences, challenging agents to deliver customized assistance rather than generic solutions. Existing GUI agent benchmarks cannot adequately capture this personalization dimension due to sparse user-specific da

ArXiv AI

AI Agents

Nomad: Autonomous Exploration and Discovery

arXiv:2603.29353v1 Announce Type: new Abstract: We introduce Nomad, a system for autonomous data exploration and insight discovery. Given a corpus of documents, databases, or other data sources, users rarely know the full set of questions, hypotheses, or connections that could be explored. As a result, query-driven question answering and prompt-driven deep-research systems remain limited by human framing and often fail to cover the broader insight space. Nomad addresses this problem with an expl

ArXiv AI

Research

ELT-Bench-Verified: Benchmark Quality Issues Underestimate AI Agent Capabilities

arXiv:2603.29399v1 Announce Type: new Abstract: Constructing Extract-Load-Transform (ELT) pipelines is a labor-intensive data engineering task and a high-impact target for AI automation. On ELT-Bench, the first benchmark for end-to-end ELT pipeline construction, AI agents initially showed low success rates, suggesting they lacked practical utility. We revisit these results and identify two factors causing a substantial underestimation of agent capabilities. First, re-evaluating ELT-Bench with up

ArXiv AI

Research

Metriplector: From Field Theory to Neural Architecture

arXiv:2603.29496v1 Announce Type: new Abstract: We present Metriplector, a neural architecture primitive in which the input configures an abstract physical system--fields, sources, and operators--and the dynamics of that system is the computation. Multiple fields evolve via coupled metriplectic dynamics, and the stress-energy tensor T^{{\mu}{\nu}}, derived from Noether's theorem, provides the readout. The metriplectic formulation admits a natural spectrum of instantiations: the dissipative branc

ArXiv AI

Research

Learning to Generate Formally Verifiable Step-by-Step Logic Reasoning via Structured Formal Intermediaries

arXiv:2603.29500v1 Announce Type: new Abstract: Large language models (LLMs) have recently demonstrated impressive performance on complex, multi-step reasoning tasks, especially when post-trained with outcome-rewarded reinforcement learning Guo et al. 2025. However, it has been observed that outcome rewards often overlook flawed intermediate steps, leading to unreliable reasoning steps even when final answers are correct. To address this unreliable reasoning, we propose PRoSFI (Process Reward ov

ArXiv AI

Research

Measuring the metacognition of AI

arXiv:2603.29693v1 Announce Type: new Abstract: A robust decision-making process must take into account uncertainty, especially when the choice involves inherent risks. Because artificial Intelligence (AI) systems are increasingly integrated into decision-making workflows, managing uncertainty relies more and more on the metacognitive capabilities of these systems; i.e, their ability to assess the reliability of and regulate their own decisions. Hence, it is crucial to employ robust methods to m

ArXiv AI

AI Agents

Symphony for Medical Coding: A Next-Generation Agentic System for Scalable and Explainable Medical Coding

arXiv:2603.29709v1 Announce Type: new Abstract: Medical coding translates free-text clinical documentation into standardized codes drawn from classification systems that contain tens of thousands of entries and are updated annually. It is central to billing, clinical research, and quality reporting, yet remains largely manual, slow, and error-prone. Existing automated approaches learn to predict a fixed set of codes from labeled data, thereby preventing adaptation to new codes or different codin

ArXiv AI

Research

Spontaneous Functional Differentiation in Large Language Models: A Brain-Like Intelligence Economy

arXiv:2603.29735v1 Announce Type: new Abstract: The evolution of intelligence in artificial systems provides a unique opportunity to identify universal computational principles. Here we show that large language models spontaneously develop synergistic cores where information integration exceeds individual parts remarkably similar to the human brain. Using Integrated Information Decomposition across multiple architectures we find that middle layers exhibit synergistic processing while early and l

ArXiv AI

AI Tools

Reasoning-Driven Synthetic Data Generation and Evaluation

arXiv:2603.29791v1 Announce Type: new Abstract: Although many AI applications of interest require specialized multi-modal models, relevant data to train such models is inherently scarce or inaccessible. Filling these gaps with human annotators is prohibitively expensive, error-prone, and time-consuming, leading model builders to increasingly consider synthetic data as a scalable alternative. However, existing synthetic data generation methods often rely on manual prompts, evolutionary algorithms

ArXiv AI

AI Tools

Owl-AuraID 1.0: An Intelligent System for Autonomous Scientific Instrumentation and Scientific Data Analysis

arXiv:2603.29828v1 Announce Type: new Abstract: Scientific discovery increasingly depends on high-throughput characterization, yet automation is hindered by proprietary GUIs and the limited generalizability of existing API-based systems. We present Owl-AuraID, a software-hardware collaborative embodied agent system that adopts a GUI-native paradigm to operate instruments through the same interfaces as human experts. Its skill-centric framework integrates Type-1 (GUI operation) and Type-2 (data a

ArXiv AI

Research

ShapE-GRPO: Shapley-Enhanced Reward Allocation for Multi-Candidate LLM Training

arXiv:2603.29871v1 Announce Type: new Abstract: In user-agent interaction scenarios such as recommendation, brainstorming, and code suggestion, Large Language Models (LLMs) often generate sets of candidate recommendations where the objective is to maximize the collective utility of the entire set rather than individual candidates independently. However, existing reinforcement learning post-training paradigms, such as Group Relative Policy Optimization (GRPO), typically assign the same set-level

ArXiv AI

AI Tools

ATP-Bench: Towards Agentic Tool Planning for MLLM Interleaved Generation

arXiv:2603.29902v1 Announce Type: new Abstract: Interleaved text-and-image generation represents a significant frontier for Multimodal Large Language Models (MLLMs), offering a more intuitive way to convey complex information. Current paradigms rely on either image generation or retrieval augmentation, yet they typically treat the two as mutually exclusive paths, failing to unify factuality with creativity. We argue that the next milestone in this field is Agentic Tool Planning, where the model

ArXiv AI

Research

C-TRAIL: A Commonsense World Framework for Trajectory Planning in Autonomous Driving

arXiv:2603.29908v1 Announce Type: new Abstract: Trajectory planning for autonomous driving increasingly leverages large language models (LLMs) for commonsense reasoning, yet LLM outputs are inherently unreliable, posing risks in safety-critical applications. We propose C-TRAIL, a framework built on a Commonsense World that couples LLM-derived commonsense with a trust mechanism to guide trajectory planning. C-TRAIL operates through a closed-loop Recall, Plan, and Update cycle: the Recall module q

ArXiv AI

Research

ScoringBench: A Benchmark for Evaluating Tabular Foundation Models with Proper Scoring Rules

arXiv:2603.29928v1 Announce Type: new Abstract: Tabular foundation models such as TabPFN and TabICL already produce full predictive distributions yet prevailing regression benchmarks evaluate them almost exclusively via point estimate metrics RMSE R2 These aggregate measures often obscure model performance in the tails of the distribution a critical deficit for high stakes decision making in domains like finance and clinical research where asymmetric risk profiles are the norm We introduce Scori

ArXiv AI

Research

The Triadic Cognitive Architecture: Bounding Autonomous Action via Spatio-Temporal and Epistemic Friction

arXiv:2603.30031v1 Announce Type: new Abstract: Current autonomous AI agents, driven primarily by Large Language Models (LLMs), operate in a state of cognitive weightlessness: they process information without an intrinsic sense of network topology, temporal pacing, or epistemic limits. Consequently, heuristic agentic loops (e.g., ReAct) can exhibit failure modes in interactive environments, including excessive tool use under congestion, prolonged deliberation under time decay, and brittle behavi

ArXiv AI

Research

Byzantine-Robust and Communication-Efficient Distributed Training: Compressive and Cyclic Gradient Coding

arXiv:2603.28780v1 Announce Type: cross Abstract: In this paper, we study the problem of distributed training (DT) under Byzantine attacks with communication constraints. While prior work has developed various robust aggregation rules at the server to enhance robustness to Byzantine attacks, the existing methods suffer from a critical limitation in that the solution error does not diminish when the local gradients sent by different devices vary considerably, as a result of data heterogeneity amo

ArXiv AI

AI Tools

StepCache: Step-Level Reuse with Lightweight Verification and Selective Patching for LLM Serving

arXiv:2603.28795v1 Announce Type: cross Abstract: We address LLM serving workloads where repeated requests share a common solution structure but differ in localized constraints, such as output schema, variable names, or numeric constants. Prior caching approaches typically reuse either full responses (semantic caching) or model-internal KV/prefix states, which are respectively brittle under partial changes or tightly coupled to specific backends. We present StepCache, a backend-agnostic step-lev

ArXiv AI

Research

GaloisSAT: Differentiable Boolean Satisfiability Solving via Finite Field Algebra

arXiv:2603.28796v1 Announce Type: cross Abstract: Boolean satisfiability (SAT) problem, the first problem proven to be NP-complete, has become a fundamental challenge in computational complexity, with widespread applications in optimization and verification across many domains. Despite significant algorithmic advances over the past two decades, the performance of SAT solvers has improved at a limited pace. Notably, the 2025 competition winner shows only about a 2X improvement over the 2006 winne

ArXiv AI

Research

MMFace-DiT: A Dual-Stream Diffusion Transformer for High-Fidelity Multimodal Face Generation

arXiv:2603.29029v1 Announce Type: cross Abstract: Recent multimodal face generation models address the spatial control limitations of text-to-image diffusion models by augmenting text-based conditioning with spatial priors such as segmentation masks, sketches, or edge maps. This multimodal fusion enables controllable synthesis aligned with both high-level semantic intent and low-level structural layout. However, most existing approaches typically extend pre-trained text-to-image pipelines by app

ArXiv AI

Research

On the Mirage of Long-Range Dependency, with an Application to Integer Multiplication

arXiv:2603.29069v1 Announce Type: cross Abstract: Integer multiplication has long been considered a hard problem for neural networks, with the difficulty widely attributed to the O(n) long-range dependency induced by carry chains. We argue that this diagnosis is wrong: long-range dependency is not an intrinsic property of multiplication, but a mirage produced by the choice of computational spacetime. We formalize the notion of mirage and provide a constructive proof: when two n-bit binary intege

ArXiv AI

Research

WorldFlow3D: Flowing Through 3D Distributions for Unbounded World Generation

arXiv:2603.29089v1 Announce Type: cross Abstract: Unbounded 3D world generation is emerging as a foundational task for scene modeling in computer vision, graphics, and robotics. In this work, we present WorldFlow3D, a novel method capable of generating unbounded 3D worlds. Building upon a foundational property of flow matching - namely, defining a path of transport between two data distributions - we model 3D generation more generally as a problem of flowing through 3D data distributions, not li

ArXiv AI

Research

Efficient and Scalable Granular-ball Graph Coarsening Method for Large-scale Graph Node Classification

arXiv:2603.29148v1 Announce Type: cross Abstract: Graph Convolutional Network (GCN) is a model that can effectively handle graph data tasks and has been successfully applied. However, for large-scale graph datasets, GCN still faces the challenge of high computational overhead, especially when the number of convolutional layers in the graph is large. Currently, there are many advanced methods that use various sampling techniques or graph coarsening techniques to alleviate the inconvenience caused

ArXiv AI

Research

SLVMEval: Synthetic Meta Evaluation Benchmark for Text-to-Long Video Generation

arXiv:2603.29186v1 Announce Type: cross Abstract: This paper proposes the synthetic long-video meta-evaluation (SLVMEval), a benchmark for meta-evaluating text-to-video (T2V) evaluation systems. The proposed SLVMEval benchmark focuses on assessing these systems on videos of up to 10,486 s (approximately 3 h). The benchmark targets a fundamental requirement, namely, whether the systems can accurately assess video quality in settings that are easy for humans to assess. We adopt a pairwise comparis

ArXiv AI

Research

Software Vulnerability Detection Using a Lightweight Graph Neural Network

arXiv:2603.29216v1 Announce Type: cross Abstract: Large Language Models (LLMs) have emerged as a popular choice in vulnerability detection studies given their foundational capabilities, open source availability, and variety of models, but have limited scalability due to extensive compute requirements. Using the natural graph relational structure of code, we show that our proposed graph neural network (GNN) based deep learning model VulGNN for vulnerability detection can achieve performance almos

ArXiv AI

Research

MemRerank: Preference Memory for Personalized Product Reranking

arXiv:2603.29247v1 Announce Type: cross Abstract: LLM-based shopping agents increasingly rely on long purchase histories and multi-turn interactions for personalization, yet naively appending raw history to prompts is often ineffective due to noise, length, and relevance mismatch. We propose MemRerank, a preference memory framework that distills user purchase history into concise, query-independent signals for personalized product reranking. To study this problem, we build an end-to-end benchmar

ArXiv AI

Research

Omni-NegCLIP: Enhancing CLIP with Front-Layer Contrastive Fine-Tuning for Comprehensive Negation Understanding

arXiv:2603.29258v1 Announce Type: cross Abstract: Vision-Language Models (VLMs) have demonstrated strong capabilities across a wide range of multimodal tasks. However, recent studies have shown that VLMs, such as CLIP, perform poorly in understanding negation expressions, which are common in natural language. In this work, we propose Omni-NegCLIP, a fine-tuned CLIP model that improves CLIP's understanding of two types of negation, namely presence-based negation and absence-based negation, which

ArXiv AI

Research

PRISM: A Multi-View Multi-Capability Retail Video Dataset for Embodied Vision-Language Models

arXiv:2603.29281v1 Announce Type: cross Abstract: A critical gap exists between the general-purpose visual understanding of state-of-the-art physical AI models and the specialized perceptual demands of structured real-world deployment environments. We present PRISM, a 270K-sample multi-view video supervised fine-tuning (SFT) corpus for embodied vision-language-models (VLMs) in real-world retail environments. PRISM is motivated by a simple observation - physical AI systems fail not because of poo

ArXiv AI

Research

Self-Improving Code Generation via Semantic Entropy and Behavioral Consensus

arXiv:2603.29292v1 Announce Type: cross Abstract: Improving the code generation capabilities of large language models (LLMs) typically relies on supervised fine-tuning or preference optimization, both of which require costly external resources such as powerful teacher models or reliable test units. However, in real-world scenarios, it is much harder to obtain reference solutions and test oracles than problem descriptions and test inputs. In this paper, we tackle a challenging yet realistic quest

ArXiv AI

Research

IMPASTO: Integrating Model-Based Planning with Learned Dynamics Models for Robotic Oil Painting Reproduction

arXiv:2603.29315v1 Announce Type: cross Abstract: Robotic reproduction of oil paintings using soft brushes and pigments requires force-sensitive control of deformable tools, prediction of brushstroke effects, and multi-step stroke planning, often without human step-by-step demonstrations or faithful simulators. Given only a sequence of target oil painting images, can a robot infer and execute the stroke trajectories, forces, and colors needed to reproduce it? We present IMPASTO, a robotic oil-pa

ArXiv AI

Research

PromptForge-350k: A Large-Scale Dataset and Contrastive Framework for Prompt-Based AI Image Forgery Localization

arXiv:2603.29386v1 Announce Type: cross Abstract: The rapid democratization of prompt-based AI image editing has recently exacerbated the risks associated with malicious content fabrication and misinformation. However, forgery localization methods targeting these emerging editing techniques remain significantly under-explored. To bridge this gap, we first introduce a fully automated mask annotating framework that leverages keypoint alignment and semantic space similarity to generate precise grou

ArXiv AI

Research

Hallucination-aware intermediate representation edit in large vision-language models

arXiv:2603.29405v1 Announce Type: cross Abstract: Large Vision-Language Models have demonstrated exceptional performance in multimodal reasoning and complex scene understanding. However, these models still face significant hallucination issues, where outputs contradict visual facts. Recent research on hallucination mitigation has focused on retraining methods and Contrastive Decoding (CD) methods. While both methods perform well, retraining methods require substantial training resources, and CD

ArXiv AI

Research

AGFT: Alignment-Guided Fine-Tuning for Zero-Shot Adversarial Robustness of Vision-Language Models

arXiv:2603.29410v1 Announce Type: cross Abstract: Pre-trained vision-language models (VLMs) exhibit strong zero-shot generalization but remain vulnerable to adversarial perturbations. Existing classification-guided adversarial fine-tuning methods often disrupt pre-trained cross-modal alignment, weakening visual-textual correspondence and degrading zero-shot performance. In this paper, we propose an Alignment-Guided Fine-Tuning (AGFT) framework that enhances zero-shot adversarial robustness while

ArXiv AI

Research

NeoNet: An End-to-End 3D MRI-Based Deep Learning Framework for Non-Invasive Prediction of Perineural Invasion via Generation-Driven Classification

arXiv:2603.29449v1 Announce Type: cross Abstract: Minimizing invasive diagnostic procedures to reduce the risk of patient injury and infection is a central goal in medical imaging. And yet, noninvasive diagnosis of perineural invasion (PNI), a critical prognostic factor involving infiltration of tumor cells along the surrounding nerve, still remains challenging, due to the lack of clear and consistent imaging criteria criteria for identifying PNI. To address this challenge, we present NeoNet, an

ArXiv AI

Research

An Isotropic Approach to Efficient Uncertainty Quantification with Gradient Norms

arXiv:2603.29466v1 Announce Type: cross Abstract: Existing methods for quantifying predictive uncertainty in neural networks are either computationally intractable for large language models or require access to training data that is typically unavailable. We derive a lightweight alternative through two approximations: a first-order Taylor expansion that expresses uncertainty in terms of the gradient of the prediction and the parameter covariance, and an isotropy assumption on the parameter covar

ArXiv AI

Research

Target-Aligned Reinforcement Learning

arXiv:2603.29501v1 Announce Type: cross Abstract: Many reinforcement learning algorithms rely on target networks - lagged copies of the online network - to stabilize training. While effective, this mechanism introduces a fundamental stability-recency tradeoff: slower target updates improve stability but reduce the recency of learning signals, hindering convergence speed. We propose Target-Aligned Reinforcement Learning (TARL), a framework that emphasizes transitions for which the target and onli

ArXiv AI

Research

Quantum computers need vastly fewer resources than thought to break vital encryption

No, the sky isn't falling, but Q Day is coming, and it won't be as expensive as thought.

Ars Technica AI

Research

Extend3D: Town-Scale 3D Generation

arXiv:2603.29387v1 Announce Type: cross Abstract: In this paper, we propose Extend3D, a training-free pipeline for 3D scene generation from a single image, built upon an object-centric 3D generative model. To overcome the limitations of fixed-size latent spaces in object-centric models for representing wide scenes, we extend the latent space in the $x$ and $y$ directions. Then, by dividing the extended latent space into overlapping patches, we apply the object-centric 3D generative model to each

ArXiv AI

Trend

My son pleasured himself on Gemini Live. Entire family's Google accounts banned

Article URL: https://old.reddit.com/r/LegalAdviceUK/comments/1s92fql/my_son_pleasured_himself_in_front_of_gemini_live/ Comments URL: https://news.ycombinator.com/item?id=47595971 Points: 181 # Comments: 136

Hacker News

AI Agents

Show HN: I turned a sketch into a 3D-print pegboard for my kid with an AI agent

We have pegboards and plywood all over our apartment, and I had an idea to make a tiny pegboard for my kid, Oli. So I naturally cut the wood, drilled in the holes, sat down at the computer to open Fusion 360 and spend an hour or two drawing the pieces by hand.Then I looked at the rough sketch Oli and I had made together, took a photo of it, pasted it into Codex, and gave it just two dimensions: the holes are 40mm apart and the pegs are 8mm wide.To my surprise, 5 minutes later my 3D printer was h

Hacker News

Education

Learn Claude Code by doing, not reading

Article URL: https://claude.nagdy.me/ Comments URL: https://news.ycombinator.com/item?id=47579229 Points: 270 # Comments: 109

Hacker News

Research

SNEAKDOOR: Stealthy Backdoor Attacks against Distribution Matching-based Dataset Condensation

arXiv:2603.28824v1 Announce Type: cross Abstract: Dataset condensation aims to synthesize compact yet informative datasets that retain the training efficacy of full-scale data, offering substantial gains in efficiency. Recent studies reveal that the condensation process can be vulnerable to backdoor attacks, where malicious triggers are injected into the condensation dataset, manipulating model behavior during inference. While prior approaches have made progress in balancing attack success rate

ArXiv AI

Trend

Incentives, Equilibria, and the Limits of Healthcare AI: A Game-Theoretic Perspective

arXiv:2603.28825v1 Announce Type: cross Abstract: Artificial intelligence (AI) is widely promoted as a promising technological response to healthcare capacity and productivity pressures. Deployment of AI systems carries significant costs including ongoing costs of monitoring and whether optimism of a deus ex machina solution is well-placed is unclear. This paper proposes three archetypal AI technology types: AI for effort reduction, AI to increase observability, and mechanism-level incentive cha

ArXiv AI

Research

GMA-SAWGAN-GP: A Novel Data Generative Framework to Enhance IDS Detection Performance

arXiv:2603.28838v1 Announce Type: cross Abstract: Intrusion Detection System (IDS) is often calibrated to known attacks and generalizes poorly to unknown threats. This paper proposes GMA-SAWGAN-GP, a novel generative augmentation framework built on a Self-Attention-enhanced Wasserstein GAN with Gradient Penalty (WGAN-GP). The generator employs Gumbel-Softmax regularization to model discrete fields, while a Multilayer Perceptron (MLP)-based AutoEncoder acts as a manifold regularizer. A lightweigh

ArXiv AI

Research

Differentiable Initialization-Accelerated CPU-GPU Hybrid Combinatorial Scheduling

arXiv:2603.28943v1 Announce Type: cross Abstract: This paper presents a hybrid CPU-GPU framework for solving combinatorial scheduling problems formulated as Integer Linear Programming (ILP). While scheduling underpins many optimization tasks in computing systems, solving these problems optimally at scale remains a long-standing challenge due to their NP-hard nature. We introduce a novel approach that combines differentiable optimization with classical ILP solving. Specifically, we utilize differ

ArXiv AI

Research

Evaluating a Data-Driven Redesign Process for Intelligent Tutoring Systems

arXiv:2603.29094v1 Announce Type: cross Abstract: Past research has defined a general process for the data-driven redesign of educational technologies and has shown that in carefully-selected instances, this process can help make systems more effective. In the current work, we test the generality of the approach by applying it to four units of a middle-school mathematics intelligent tutoring system that were selected not based on suitability for redesign, as in previous work, but on topic. We te

ArXiv AI

Research

Towards Explainable Stakeholder-Aware Requirements Prioritisation in Aged-Care Digital Health

arXiv:2603.29114v1 Announce Type: cross Abstract: Requirements engineering for aged-care digital health must account for human aspects, because requirement priorities are shaped not only by technical functionality but also by stakeholders' health conditions, socioeconomics, and lived experience. Knowing which human aspects matter most, and for whom, is critical for inclusive and evidence-based requirements prioritisation. Yet in practice, while some studies have examined human aspects in RE, the

ArXiv AI

Funding

Yupp shuts down after raising $33M from a16z crypto’s Chris Dixon

Less than a year after launching, with checks from some of the biggest names in Silicon Valley, crowdsourced AI model feedback startup Yupp is closing its business, the company said Tuesday.

TechCrunch AI

Product Launch

Alexa+ gets new food ordering experiences with Uber Eats and Grubhub

You can now order from Uber Eats and Grubhub using Alexa+, an experience Amazon says will be similar to chatting with a waiter at a restaurant or placing an order at a drive-thru.

TechCrunch AI

Trend

As more Americans adopt AI tools, fewer say they can trust the results

AI adoption is rising in the U.S., but trust remains low, with most Americans concerned about transparency, regulation, and the technology’s broader societal impact, according to a new Quinnipiac poll.

TechCrunch AI

AI Tools

You can order Grubhub and Uber Eats ‘conversationally’ with Alexa Plus

Amazon is giving you a new way to order food through Grubhub and Uber Eats with Alexa without having to endure an awkward exchange just to add fries. Amazon said the entire process is meant to be conversational, building your order in a similar manner to ordering in a restaurant. That means changing your order, […]

The Verge AI

Industry

The Download: gig workers training humanoids, and better AI benchmarks

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. The gig workers who are training humanoid robots at home  When Zeus, a medical student in Nigeria, returns to his apartment from a long day at the hospital, he straps his…

MIT Technology Review

Industry

The gig workers who are training humanoid robots at home

When Zeus, a medical student living in a hilltop city in central Nigeria, returns to his studio apartment from a long day at the hospital, he turns on his ring light, straps his iPhone to his forehead, and starts recording himself. He raises his hands in front of him like a sleepwalker and puts a…

MIT Technology Review

Industry

The Download: AI health tools and the Pentagon’s Anthropic culture war

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. There are more AI health tools than ever—but how well do they work?  In the last few months alone, Microsoft, Amazon, and OpenAI have all launched medical chatbots.  There’s a clear demand…

MIT Technology Review

Open Source

Falcon Perception

Hugging Face Blog

Research

GISTBench: Evaluating LLM User Understanding via Evidence-Based Interest Verification

arXiv:2603.29112v1 Announce Type: new Abstract: We introduce GISTBench, a benchmark for evaluating Large Language Models' (LLMs) ability to understand users from their interaction histories in recommendation systems. Unlike traditional RecSys benchmarks that focus on item prediction accuracy, our benchmark evaluates how well LLMs can extract and verify user interests from engagement data. We propose two novel metric families: Interest Groundedness (IG), decomposed into precision and recall compo

ArXiv AI

Research

Knowledge database development by large language models for countermeasures against viruses and marine toxins

arXiv:2603.29149v1 Announce Type: new Abstract: Access to the most up-to-date information on medical countermeasures is important for the research and development of effective treatments for viruses and marine toxins. However, there is a lack of comprehensive databases that curate data on viruses and marine toxins, making decisions on medical countermeasures slow and difficult. In this work, we employ two large language models (LLMs) of ChatGPT and Grok to design two comprehensive databases of t

ArXiv AI

Research

Grokking From Abstraction to Intelligence

arXiv:2603.29262v1 Announce Type: new Abstract: Grokking in modular arithmetic has established itself as the quintessential fruit fly experiment, serving as a critical domain for investigating the mechanistic origins of model generalization. Despite its significance, existing research remains narrowly focused on specific local circuits or optimization tuning, largely overlooking the global structural evolution that fundamentally drives this phenomenon. We propose that grokking originates from a

ArXiv AI

Research

BenchScope: How Many Independent Signals Does Your Benchmark Provide?

arXiv:2603.29357v1 Announce Type: new Abstract: AI evaluation suites often report many scores without checking whether those scores carry independent information. We introduce Effective Dimensionality (ED), the participation ratio of a centered benchmark-score spectrum, as a fast, population-conditional upper-bound diagnostic of measurement breadth. Applied at per-instance granularity to 22 benchmarks across 8 domains and more than 8,400 model evaluations, ED reveals substantial redundancy: the

ArXiv AI

Research

Rigorous Explanations for Tree Ensembles

arXiv:2603.29361v1 Announce Type: new Abstract: Tree ensembles (TEs) find a multitude of practical applications. They represent one of the most general and accurate classes of machine learning methods. While they are typically quite concise in representation, their operation remains inscrutable to human decision makers. One solution to build trust in the operation of TEs is to automatically identify explanations for the predictions made. Evidently, we can only achieve trust using explanations, i

ArXiv AI

Industry

AI-Generated Prior Authorization Letters: Strong Clinical Content, Weak Administrative Scaffolding

arXiv:2603.29366v1 Announce Type: new Abstract: Prior authorization remains one of the most burdensome administrative processes in U.S. healthcare, consuming billions of dollars and thousands of physician hours each year. While large language models have shown promise across clinical text tasks, their ability to produce submission-ready prior authorization letters has received only limited attention, with existing work confined to single-case demonstrations rather than structured multi-scenario

ArXiv AI

Research

Structural Compactness as a Complementary Criterion for Explanation Quality

arXiv:2603.29491v1 Announce Type: new Abstract: In the evaluation of attribution quality, the quantitative assessment of explanation legibility is particularly difficult, as it is influenced by varying shapes and internal organization of attributions not captured by simple statistics. To address this issue, we introduce Minimum Spanning Tree Compactness (MST-C), a graph-based structural metric that captures higher-order geometric properties of attributions, such as spread and cohesion. These com

ArXiv AI

Research

FlowPIE: Test-Time Scientific Idea Evolution with Flow-Guided Literature Exploration

arXiv:2603.29557v1 Announce Type: new Abstract: Scientific idea generation (SIG) is critical to AI-driven autonomous research, yet existing approaches are often constrained by a static retrieval-then-generation paradigm, leading to homogeneous and insufficiently divergent ideas. In this work, we propose FlowPIE, a tightly coupled retrieval-generation framework that treats literature exploration and idea generation as a co-evolving process. FlowPIE expands literature trajectories via a flow-guide

ArXiv AI

AI Tools

View-oriented Conversation Compiler for Agent Trace Analysis

arXiv:2603.29678v1 Announce Type: new Abstract: Agent traces carry increasing analytical value in the era of context learning and harness-driven agentic cognition, yet most prior work treats conversation format as a trivial engineering detail. Modern agent conversations contain deeply structured content, including nested tool calls and results, chain-of-thought reasoning blocks, sub-agent invocations, context-window compaction boundaries, and harness-injected system directives, whose complexity

ArXiv AI

Research

A First Step Towards Even More Sparse Encodings of Probability Distributions

arXiv:2603.29691v1 Announce Type: new Abstract: Real world scenarios can be captured with lifted probability distributions. However, distributions are usually encoded in a table or list, requiring an exponential number of values. Hence, we propose a method for extracting first-order formulas from probability distributions that require significantly less values by reducing the number of values in a distribution and then extracting, for each value, a logical formula to be further minimized. This r

ArXiv AI

Research

Reinforced Reasoning for End-to-End Retrosynthetic Planning

arXiv:2603.29723v1 Announce Type: new Abstract: Retrosynthetic planning is a fundamental task in organic chemistry, yet remains challenging due to its combinatorial complexity. To address this, conventional approaches typically rely on hybrid frameworks that combine single-step predictions with external search heuristics, inevitably fracturing the logical coherence between local molecular transformations and global planning objectives. To bridge this gap and embed sophisticated strategic foresig

ArXiv AI

Research

Spatiotemporal Robustness of Temporal Logic Tasks using Multi-Objective Reasoning

arXiv:2603.29868v1 Announce Type: new Abstract: The reliability of autonomous systems depends on their robustness, i.e., their ability to meet their objectives under uncertainty. In this paper, we study spatiotemporal robustness of temporal logic specifications evaluated over discrete-time signals. Existing work has proposed robust semantics that capture not only Boolean satisfiability, but also the geometric distance from unsatisfiability, corresponding to admissible spatial perturbations of a

ArXiv AI

Research

Uncertainty Gating for Cost-Aware Explainable Artificial Intelligence

arXiv:2603.29915v1 Announce Type: new Abstract: Post-hoc explanation methods are widely used to interpret black-box predictions, but their generation is often computationally expensive and their reliability is not guaranteed. We propose epistemic uncertainty as a low-cost proxy for explanation reliability: high epistemic uncertainty identifies regions where the decision boundary is poorly defined and where explanations become unstable and unfaithful. This insight enables two complementary use ca

ArXiv AI

Research

Structured Intent as a Protocol-Like Communication Layer: Cross-Model Robustness, Framework Comparison, and the Weak-Model Compensation Effect

arXiv:2603.29953v1 Announce Type: new Abstract: How reliably can structured intent representations preserve user goals across different AI models, languages, and prompting frameworks? Prior work showed that PPS (Prompt Protocol Specification), a 5W3H-based structured intent framework, improves goal alignment in Chinese and generalizes to English and Japanese. This paper extends that line of inquiry in three directions: cross-model robustness across Claude, GPT-4o, and Gemini 2.5 Pro; controlled

ArXiv AI

Research

Extending MONA in Camera Dropbox: Reproduction, Learned Approval, and Design Implications for Reward-Hacking Mitigation

arXiv:2603.29993v1 Announce Type: new Abstract: Myopic Optimization with Non-myopic Approval (MONA) mitigates multi-step reward hacking by restricting the agent's planning horizon while supplying far-sighted approval as a training signal~\cite{farquhar2025mona}. The original paper identifies a critical open question: how the method of constructing approval -- particularly the degree to which approval depends on achieved outcomes -- affects whether MONA's safety guarantees hold. We present a repr

ArXiv AI

Research

DF-ACBlurGAN: Structure-Aware Conditional Generation of Internally Repeated Patterns for Biomaterial Microtopography Design

arXiv:2603.28776v1 Announce Type: cross Abstract: Learning to generate images with internally repeated and periodic structures poses a fundamental challenge for machine learning and computer vision models, which are typically optimised for local texture statistics and semantic realism rather than global structural consistency. This limitation is particularly pronounced in applications requiring strict control over repetition scale, spacing, and boundary coherence, such as microtopographical biom

ArXiv AI

Research

A Multi-Modal Dataset for Ground Reaction Force Estimation Using Consumer Wearable Sensors

arXiv:2603.28784v1 Announce Type: cross Abstract: This Data Descriptor presents a fully open, multi-modal dataset for estimating vertical ground reaction force (vGRF) from consumer-grade Apple Watch sensors with laboratory force plate ground truth. Ten healthy adults aged 26--41 years performed five activities: walking, jogging, running, heel drops, and step drops, while wearing two Apple Watches positioned at the left wrist and waist. The dataset contains 492 validated trials with time-aligned

ArXiv AI

Research

Design and Development of an ML/DL Attack Resistance of RC-Based PUF for IoT Security

arXiv:2603.28798v1 Announce Type: cross Abstract: Physically Unclonable Functions (PUFs) provide promising hardware security for IoT authentication, leveraging inherent randomness suitable for resource constrained environments. However, ML/DL modeling attacks threaten PUF security by learning challenge-response patterns. This work introduces a custom resistor-capacitor (RC) based dynamically reconfigurable PUF using 32-bit challenge-response pairs (CRPs) designed to resist such attacks. We syste

ArXiv AI

Research

CREST: Constraint-Release Execution for Multi-Robot Warehouse Shelf Rearrangement

arXiv:2603.28803v1 Announce Type: cross Abstract: Double-Deck Multi-Agent Pickup and Delivery (DD-MAPD) models the multi-robot shelf rearrangement problem in automated warehouses. MAPF-DECOMP is a recent framework that first computes collision-free shelf trajectories with a MAPF solver and then assigns agents to execute them. While efficient, it enforces strict trajectory dependencies, often leading to poor execution quality due to idle agents and unnecessary shelf switching. We introduce CREST,

ArXiv AI

AI Tools

WAter: A Workload-Adaptive Knob Tuning System based on Workload Compression

arXiv:2603.28809v1 Announce Type: cross Abstract: Selecting appropriate values for the configurable parameters of Database Management Systems (DBMS) to improve performance is a significant challenge. Recent machine learning (ML)-based tuning systems have shown strong potential, but their practical adoption is often limited by the high tuning cost. This cost arises from two main factors: (1) the system needs to evaluate a large number of configurations to identify a satisfactory one, and (2) for

ArXiv AI

Research

IMPACT: Influence Modeling for Open-Set Time Series Anomaly Detection

arXiv:2603.29183v1 Announce Type: cross Abstract: Open-set anomaly detection (OSAD) is an emerging paradigm designed to utilize limited labeled data from anomaly classes seen in training to identify both seen and unseen anomalies during testing. Current approaches rely on simple augmentation methods to generate pseudo anomalies that replicate unseen anomalies. Despite being promising in image data, these methods are found to be ineffective in time series data due to the failure to preserve its s

ArXiv AI

Research

SyriSign: A Parallel Corpus for Arabic Text to Syrian Arabic Sign Language Translation

arXiv:2603.29219v1 Announce Type: cross Abstract: Sign language is the primary approach of communication for the Deaf and Hard-of-Hearing (DHH) community. While there are numerous benchmarks for high-resource sign languages, low-resource languages like Arabic remain underrepresented. Currently, there is no publicly available dataset for Syrian Arabic Sign Language (SyArSL). To overcome this gap, we introduce SyriSign, a dataset comprising 1500 video samples across 150 unique lexical signs, desig

ArXiv AI

Research

Derived Fields Preserve Fine-Scale Detail in Budgeted Neural Simulators

arXiv:2603.29224v1 Announce Type: cross Abstract: Fine-scale-faithful neural simulation under fixed storage budgets remains challenging. Many existing methods reduce high-frequency error by improving architectures, training objectives, or rollout strategies. However, under budgeted coarsen-quantize-decode pipelines, fine detail can already be lost when the carried state is constructed. In the canonical periodic incompressible Navier-Stokes setting, we show that primitive and derived fields under

ArXiv AI

Research

Monodense Deep Neural Model for Determining Item Price Elasticity

arXiv:2603.29261v1 Announce Type: cross Abstract: Item Price Elasticity is used to quantify the responsiveness of consumer demand to changes in item prices, enabling businesses to create pricing strategies and optimize revenue management. Sectors such as store retail, e-commerce, and consumer goods rely on elasticity information derived from historical sales and pricing data. This elasticity provides an understanding of purchasing behavior across different items, consumer discount sensitivity, a

ArXiv AI

Trend

Downsides of Smartness Across Edge-Cloud Continuum in Modern Industry

arXiv:2603.29289v1 Announce Type: cross Abstract: The fast pace of modern AI is rapidly transforming traditional industrial systems into vast, intelligent and potentially unmanned autonomous operational environments driven by AI-based solutions. These solutions leverage various forms of machine learning, reinforcement learning, and generative AI. The introduction of such smart capabilities has pushed the envelope in multiple industrial domains, enabling predictive maintenance, optimized performa

ArXiv AI

Research

MELT: Improve Composed Image Retrieval via the Modification Frequentation-Rarity Balance Network

arXiv:2603.29291v1 Announce Type: cross Abstract: Composed Image Retrieval (CIR) uses a reference image and a modification text as a query to retrieve a target image satisfying the requirement of ``modifying the reference image according to the text instructions''. However, existing CIR methods face two limitations: (1) frequency bias leading to ``Rare Sample Neglect'', and (2) susceptibility of similarity scores to interference from hard negative samples and noise. To address these limitations,

ArXiv AI

Research

Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning

arXiv:2603.29328v1 Announce Type: cross Abstract: Backdoor attacks on federated learning (FL) are most often evaluated with synthetic corner patches or out-of-distribution (OOD) patterns that are unlikely to arise in practice. In this paper, we revisit the backdoor threat to standard FL (a single global model) under a more realistic setting where triggers must be semantically meaningful, in-distribution, and visually plausible. We propose SABLE, a Semantics-Aware Backdoor for LEarning in federat

ArXiv AI

Research

Deep Learning-Based Anomaly Detection in Spacecraft Telemetry on Edge Devices

arXiv:2603.29375v1 Announce Type: cross Abstract: Spacecraft anomaly detection is critical for mission safety, yet deploying sophisticated models on-board presents significant challenges due to hardware constraints. This paper investigates three approaches for spacecraft telemetry anomaly detection -- forecasting & threshold, direct classification, and image classification -- and optimizes them for edge deployment using multi-objective neural architecture optimization on the European Space Agenc

ArXiv AI

Research

RAAP: Retrieval-Augmented Affordance Prediction with Cross-Image Action Alignment

arXiv:2603.29419v1 Announce Type: cross Abstract: Understanding object affordances is essential for enabling robots to perform purposeful and fine-grained interactions in diverse and unstructured environments. However, existing approaches either rely on retrieval, which is fragile due to sparsity and coverage gaps, or on large-scale models, which frequently mislocalize contact points and mispredict post-contact actions when applied to unseen categories, thereby hindering robust generalization. W

ArXiv AI

Research

Few-shot Writer Adaptation via Multimodal In-Context Learning

arXiv:2603.29450v1 Announce Type: cross Abstract: While state-of-the-art Handwritten Text Recognition (HTR) models perform well on standard benchmarks, they frequently struggle with writers exhibiting highly specific styles that are underrepresented in the training data. To handle unseen and atypical writers, writer adaptation techniques personalize HTR models to individual handwriting styles. Leading writer adaptation methods require either offline fine-tuning or parameter updates at inference

ArXiv AI

Research

CIPHER: Counterfeit Image Pattern High-level Examination via Representation

arXiv:2603.29356v1 Announce Type: cross Abstract: The rapid progress of generative adversarial networks (GANs) and diffusion models has enabled the creation of synthetic faces that are increasingly difficult to distinguish from real images. This progress, however, has also amplified the risks of misinformation, fraud, and identity abuse, underscoring the urgent need for detectors that remain robust across diverse generative models. In this work, we introduce Counterfeit Image Pattern High-level

ArXiv AI

Research

iPoster: Content-Aware Layout Generation for Interactive Poster Design via Graph-Enhanced Diffusion Models

arXiv:2603.29469v1 Announce Type: cross Abstract: We present iPoster, an interactive layout generation framework that empowers users to guide content-aware poster layout design by specifying flexible constraints. iPoster enables users to specify partial intentions within the intention module, such as element categories, sizes, positions, or coarse initial drafts. Then, the generation module instantly generates refined, context-sensitive layouts that faithfully respect these constraints. iPoster

ArXiv AI

Trend

15% of Americans say they’d be willing to work for an AI boss, according to new poll

According to a Quinnipiac University poll, 15% of Americans say they'd be willing to have a job where their direct supervisor was an AI program that assigned tasks and set schedules.

TechCrunch AI

AI Tools

AI can push your Stream Deck buttons for you

If you're tired of controlling Stream Deck devices by manually pushing buttons, then good news: Elgato will now let you delegate that task to a chatbot instead. The Stream Deck 7.4 software update released today introduces Model Context Protocol (MCP) support, allowing AI assistants like Claude, ChatGPT, and Nvidia G-Assist to find and activate Stream […]

The Verge AI

Policy

The Pentagon’s culture war tactic against Anthropic has backfired

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here. Last Thursday, a California judge temporarily blocked the Pentagon from labeling Anthropic a supply chain risk and ordering government agencies to stop using its AI. It’s the latest development in the month-long…

MIT Technology Review

Industry

We’re creating a new satellite imagery map to help protect Brazil’s forests.

Google partnered with the Brazilian government on a satellite imagery map to help protect the country’s forests.

Google AI Blog

Trend

AI Has Flooded All the Weather Apps

Weather forecasting has gotten a big boost from machine learning. How that translates into what users see can vary.

Wired AI

Research

REFINE: Real-world Exploration of Interactive Feedback and Student Behaviour

arXiv:2603.29142v1 Announce Type: new Abstract: Formative feedback is central to effective learning, yet providing timely, individualised feedback at scale remains a persistent challenge. While recent work has explored the use of large language models (LLMs) to automate feedback, most existing systems still conceptualise feedback as a static, one-way artifact, offering limited support for interpretation, clarification, or follow-up. In this work, we introduce REFINE, a locally deployable, multi-

ArXiv AI

AI Tools

Optimizing Donor Outreach for Blood Collection Sessions: A Scalable Decision Support Framework

arXiv:2603.29643v1 Announce Type: new Abstract: Blood donation centers face challenges in matching supply with demand while managing donor availability. Although targeted outreach is important, it can cause donor fatigue via over-solicitation. Effective recruitment requires targeting the right donors at the right time, balancing constraints with donor convenience and eligibility. Despite extensive work on blood supply chain optimization and growing interest in algorithmic donor recruitment, the

ArXiv AI

Research

Beyond the Steeper Curve: AI-Mediated Metacognitive Decoupling and the Limits of the Dunning-Kruger Metaphor

arXiv:2603.29681v1 Announce Type: new Abstract: The common claim that generative AI simply amplifies the Dunning-Kruger effect is too coarse to capture the available evidence. The clearest findings instead suggest that large language model (LLM) use can improve observable output and short-term task performance while degrading metacognitive accuracy and flattening the classic competence-confidence gradient across skill groups. This paper synthesizes evidence from human-AI interaction, learning re

ArXiv AI

Research

Tracking vs. Deciding: The Dual-Capability Bottleneck in Searchless Chess Transformers

arXiv:2603.29761v1 Announce Type: new Abstract: A human-like chess engine should mimic the style, errors, and consistency of a strong human player rather than maximize playing strength. We show that training from move sequences alone forces a model to learn two capabilities: state tracking, which reconstructs the board from move history, and decision quality, which selects good moves from that reconstructed state. These impose contradictory data requirements: low-rated games provide the diversit

ArXiv AI

Research

A Rational Account of Categorization Based on Information Theory

arXiv:2603.29895v1 Announce Type: new Abstract: We present a new theory of categorization based on an information-theoretic rational analysis. To evaluate this theory, we investigate how well it can account for key findings from classic categorization experiments conducted by Hayes-Roth and Hayes-Roth (1977), Medin and Schaffer (1978), and Smith and Minda (1998). We find that it explains the human categorization behavior at least as well (or better) than the independent cue and context models (M

ArXiv AI

Research

Physiological and Semantic Patterns in Medical Teams Using an Intelligent Tutoring System

arXiv:2603.29950v1 Announce Type: new Abstract: Effective collaboration requires teams to manage complex cognitive and emotional states through Socially Shared Regulation of Learning (SSRL). Physiological synchrony (i.e., longitudinal alignment in physiological signals) can indicate these states, but is hard to interpret on its own. We investigate the physiological and conversational dynamics of four medical dyads diagnosing a virtual patient case using an intelligent tutoring system. Semantic s

ArXiv AI

Research

The Last Fingerprint: How Markdown Training Shapes LLM Prose

arXiv:2603.27006v1 Announce Type: cross Abstract: Large language models produce em dashes at varying rates, and the observation that some models "overuse" them has become one of the most widely discussed markers of AI-generated text. Yet no mechanistic account of this pattern exists, and the parallel observation that LLMs default to markdown-formatted output has never been connected to it. We propose that the em dash is markdown leaking into prose -- the smallest surviving unit of the structural

ArXiv AI

AI Tools

Focus360: Guiding User Attention in Immersive Videos for VR

arXiv:2603.28774v1 Announce Type: cross Abstract: This demo introduces Focus360, a system designed to enhance user engagement in 360{\deg} VR videos by guiding attention to key elements within the scene. Using natural language descriptions, the system identifies important elements and applies a combination of visual effects to guide attention seamlessly. At the demonstration venue, participants can experience a 360{\deg} Safari Tour, showcasing the system's ability to improve user focus while ma

ArXiv AI

Industry

AI in Work-Based Learning: Understanding the Purposes and Effects of Intelligent Tools Among Student Interns

arXiv:2603.28786v1 Announce Type: cross Abstract: This study examined how student interns in Philippine higher education use intelligent tools during their OJT. Data were collected from 384 respondents using a structured questionnaire that asked about AI tool usage, task-specific applications, and perceptions of confidence, ethics, and support. Analysis of task-based usage identified four main purposes: productivity and report writing, communication and content drafting, technical assistance and

ArXiv AI

AI Tools

Smartphone-Based Identification of Unknown Liquids via Active Vibration Sensing

arXiv:2603.28787v1 Announce Type: cross Abstract: Traditional liquid identification instruments are often unavailable to the general public. This paper shows the feasibility of identifying unknown liquids with commercial lightweight devices, such as a smartphone. The key insight is that different liquid molecules have different viscosity coefficients and therefore must overcome different energy barriers during relative motion. With this intuition in mind, we introduce a novel model that measures

ArXiv AI

Research

The impact of multi-agent debate protocols on debate quality: a controlled case study

arXiv:2603.28813v1 Announce Type: cross Abstract: In multi-agent debate (MAD) systems, performance gains are often reported; however, because the debate protocol (e.g., number of agents, rounds, and aggregation rule) is typically held fixed while model-related factors vary, it is difficult to disentangle protocol effects from model effects. To isolate these effects, we compare three main protocols, Within-Round (WR; agents see only current-round contributions), Cross-Round (CR; full prior-round

ArXiv AI

Research

3D Architect: An Automated Approach to Three-Dimensional Modeling

arXiv:2603.29191v1 Announce Type: cross Abstract: The aim of our paper is to render an object in 3-dimension using a set of its orthographic views. Corner detector (Harris Detector) is applied on the input views to obtain control points. These control points are projected perpendicular to respective views, in order to construct an envelope. A set of points describing the object in 3-dimension, are obtained from the intersection of these mutually perpendicular envelopes. These set of points are u

ArXiv AI

Research

Improving Ensemble Forecasts of Abnormally Deflecting Tropical Cyclones with Fused Atmosphere-Ocean-Terrain Data

arXiv:2603.29200v1 Announce Type: cross Abstract: Deep learning-based tropical cyclone (TC) forecasting methods have demonstrated significant potential and application advantages, as they feature much lower computational cost and faster operation speed than numerical weather prediction models. However, existing deep learning methods still have key limitations: they can only process a single type of sequential trajectory data or homogeneous meteorological variables, and fail to achieve accurate f

ArXiv AI

Research

Sima AIunty: Caste Audit in LLM-Driven Matchmaking

arXiv:2603.29288v1 Announce Type: cross Abstract: Social and personal decisions in relational domains such as matchmaking are deeply entwined with cultural norms and historical hierarchies, and can potentially be shaped by algorithmic and AI-mediated assessments of compatibility, acceptance, and stability. In South Asian contexts, caste remains a central aspect of marital decision-making, yet little is known about how contemporary large language models (LLMs) reproduce or disrupt caste-based str

ArXiv AI

Research

Real-Time Band-Grouped Vocal Denoising Using Sigmoid-Driven Ideal Ratio Masking

arXiv:2603.29326v1 Announce Type: cross Abstract: Real-time, deep learning-based vocal denoising has seen significant progress over the past few years, demonstrating the capability of artificial intelligence in preserving the naturalness of the voice while increasing the signal-to-noise ratio (SNR). However, many deep learning approaches have high amounts of latency and require long frames of context, making them difficult to configure for live applications. To address these challenges, we propo

ArXiv AI

Research

Scaling Whole-Body Human Musculoskeletal Behavior Emulation for Specificity and Diversity

arXiv:2603.29332v1 Announce Type: cross Abstract: The embodied learning of human motor control requires whole-body neuro-actuated musculoskeletal dynamics, while the internal muscle-driven processes underlying movement remain inaccessible to direct measurement. Computational modeling offers an alternative, but inverse dynamics methods struggled to resolve redundant control from observed kinematics in the high-dimensional, over-actuated system. Forward imitation approaches based on deep reinforce

ArXiv AI

Research

Hybrid Quantum-Classical Spatiotemporal Forecasting for 3D Cloud Fields

arXiv:2603.29407v1 Announce Type: cross Abstract: Accurate forecasting of three-dimensional (3D) cloud fields is important for atmospheric analysis and short-range numerical weather prediction, yet it remains challenging because cloud evolution involves cross-layer interactions, nonlocal dependencies, and multiscale spatiotemporal dynamics. Existing spatiotemporal prediction models based on convolutions, recurrence, or attention often rely on locality-biased representations and therefore struggl

ArXiv AI

Get AI Pulse every morning

5 stories. 5 minutes. Personalized for your role. Free forever.

Open AI Pulse