AI Pulse

AI News: Apr 23, 2026

Today's 5 most important AI stories — product launches, research, funding, and more.

Product LaunchResearchResearchFundingProduct Launch
1Product Launch
■■■■□ 4/5

Google Launches Two Specialized 8th-Gen TPUs for AI Agents

Google unveiled two 8th-generation TPUs at Cloud Next: the TPU v8t for training large models and TPU v8i for inference workloads. The chips are purpose-built for agentic AI applications requiring sustained, high-throughput compute. Both will be available via Google Cloud, targeting enterprises running complex, multi-step AI agent pipelines.

Why it matters

Purpose-built inference and training silicon signals Google is hardening its cloud infrastructure advantage as agentic AI workloads demand more specialized, cost-efficient compute.

2Research
■■■□□ 3/5

FedProxy Tackles LLM Fine-Tuning Privacy-Performance Tradeoff

Researchers introduced FedProxy, a federated learning framework for fine-tuning LLMs that addresses three simultaneous challenges: protecting model IP, preserving client data privacy, and maintaining performance across heterogeneous data. The system uses proxy small language models and heterogeneity-aware fusion, closing the performance gap left by existing methods like Offsite-Tuning.

Why it matters

FedProxy could enable enterprises to collaboratively fine-tune powerful LLMs without exposing proprietary data or model weights—a critical unlock for regulated industries like healthcare and finance.

3Research
■■■□□ 3/5

New Multi-Agent Framework Cuts False Positives in AI Code Review

Researchers from arXiv introduced 'Refute-or-Promote,' a multi-agent system designed to reduce false positives in LLM-assisted code defect detection. The framework uses adversarial agents to challenge bug reports at each review stage before promotion, combined with a Cross-Model Critic. It targets a core credibility problem: AI tools generating plausible but incorrect vulnerability reports that overwhelm developers.

Why it matters

For security and engineering teams adopting AI code review, reducing false positives is the primary barrier to trust and adoption — a working solution here has direct workflow impact.

4Funding
■■■□□ 3/5

SpaceX Offers $60B to Acquire AI Coding Tool Cursor

Cursor halted a $2B funding round this week after SpaceX intervened with a $10B 'collaboration fee' and a proposed $60B acquisition deal. The AI coding assistant, developed by Anysphere, had been in active fundraising discussions before SpaceX's offer prompted the company to pause and evaluate the buyout path instead.

Why it matters

A $60B acquisition of an AI coding tool by SpaceX signals that aerospace and defense players are aggressively entering the AI software market, reshaping who competes for developer-focused AI assets.

5Product Launch
■■□□□ 2/5

Google Adds AI Automation Layer Across Entire Workspace Suite

Google updated Workspace on April 22 with new AI-driven automation features powered by Workspace Intelligence, its unified AI system. The rollout introduces automated functions across Workspace apps, aiming to handle routine office tasks. The move targets productivity gains for business users already embedded in Google's productivity ecosystem.

Why it matters

Professionals using Google Workspace should expect workflow changes as AI automation reshapes how routine tasks are handled across Docs, Sheets, Gmail, and Meet.

176 More Stories Today

Research

Nexusformer: Nonlinear Attention Expansion for Stable and Inheritable Transformer Scaling

arXiv:2604.19147v1 Announce Type: cross Abstract: Scaling Transformers typically necessitates training larger models from scratch, as standard architectures struggle to expand without discarding learned representations. We identify the primary bottleneck in the attention mechanism's linear projections, which strictly confine feature extraction to fixed-dimensional subspaces, limiting both expressivity and incremental capacity. To address this, we introduce Nexusformer, which replaces linear $Q/K

ArXiv AI

Research

LBLLM: Lightweight Binarization of Large Language Models via Three-Stage Distillation

arXiv:2604.19167v1 Announce Type: cross Abstract: Deploying large language models (LLMs) in resource-constrained environments is hindered by heavy computational and memory requirements. We present LBLLM, a lightweight binarization framework that achieves effective W(1+1)A4 quantization through a novel three-stage quantization strategy. The framework proceeds as follows: (1) initialize a high-quality quantized model via PTQ; (2) quantize binarized weights, group-wise bitmaps, and quantization par

ArXiv AI

Research

Sherpa.ai Privacy-Preserving Multi-Party Entity Alignment without Intersection Disclosure for Noisy Identifiers

arXiv:2604.19219v1 Announce Type: cross Abstract: Federated Learning (FL) enables collaborative model training among multiple parties without centralizing raw data. There are two main paradigms in FL: Horizontal FL (HFL), where all participants share the same feature space but hold different samples, and Vertical FL (VFL), where parties possess complementary features for the same set of samples. A prerequisite for VFL training is privacy-preserving entity alignment (PPEA), which establishes a co

ArXiv AI

Product Launch

Google Cloud launches two new AI chips to compete with Nvidia

Google's newest TPUs are faster and cheaper than the previous versions. But the company is still embracing Nvidia in its cloud — for now.

TechCrunch AI

Product Launch

Google turns Chrome into an AI co-worker for the workplace

Google brings Gemini-powered "auto browse" capabilities to Chrome for enterprise users, letting workers automate tasks like research, data entry, and more.

TechCrunch AI

AI Agents

OpenAI now lets teams make custom bots that can do work on their own

OpenAI is giving users of its Business, Enterprise, Edu, and Teachers plans access to cloud-based "workspace" agents available in ChatGPT that can perform business tasks. In its blog post, OpenAI gives examples of agents like one that finds product feedback on the web and sends a report in Slack and a sales agent that can […]

The Verge AI

AI Agents

Introducing workspace agents in ChatGPT

Workspace agents in ChatGPT are Codex-powered agents that automate complex workflows, run in the cloud, and help teams scale work across tools securely.

OpenAI Blog

Research

ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System

arXiv:2604.18789v1 Announce Type: new Abstract: Reinforcement Learning from Human Feedback (RLHF) is central to aligning Large Language Models (LLMs), yet it introduces a critical vulnerability: an imperfect Reward Model (RM) can become a single point of failure when it fails to penalize unsafe behaviors. While existing red-teaming approaches primarily target policy-level weaknesses, they overlook what we term systemic weaknesses cases where both the core LLM and the RM fail in tandem. We presen

ArXiv AI

AI Agents

Human-Guided Harm Recovery for Computer Use Agents

arXiv:2604.18847v1 Announce Type: new Abstract: As LM agents gain the ability to execute actions on real computer systems, we need ways to not only prevent harmful actions at scale but also effectively remediate harm when prevention fails. We formalize a solution to this neglected challenge in post-execution safeguards as harm recovery: the problem of optimally steering an agent from a harmful state back to a safe one in alignment with human preferences. We ground preference-aligned recovery thr

ArXiv AI

AI Agents

How Adversarial Environments Mislead Agentic AI?

arXiv:2604.18874v1 Announce Type: new Abstract: Tool-integrated agents are deployed on the premise that external tools ground their outputs in reality. Yet this very reliance creates a critical attack surface. Current evaluations benchmark capability in benign settings, asking "can the agent use tools correctly" but never "what if the tools lie". We identify this Trust Gap: agents are evaluated for performance, not for skepticism. We formalize this vulnerability as Adversarial Environmental Inje

ArXiv AI

Research

Reasoning Structure Matters for Safety Alignment of Reasoning Models

arXiv:2604.18946v1 Announce Type: new Abstract: Large reasoning models (LRMs) achieve strong performance on complex reasoning tasks but often generate harmful responses to malicious user queries. This paper investigates the underlying cause of these safety risks and shows that the issue lies in the reasoning structure itself. Based on this insight, we claim that effective safety alignment can be achieved by altering the reasoning structure. We propose AltTrain, a simple yet effective post traini

ArXiv AI

Research

SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution

arXiv:2604.18982v1 Announce Type: new Abstract: Social intelligence, the ability to navigate complex interpersonal interactions, presents a fundamental challenge for language agents. Training such agents via reinforcement learning requires solving the credit assignment problem: determining how individual utterances contribute to multi-turn dialogue outcomes. Existing approaches directly employ language models to distribute episode-level rewards, yielding attributions that are retrospective and l

ArXiv AI

AI Agents

ClawNet: Human-Symbiotic Agent Network for Cross-User Autonomous Cooperation

arXiv:2604.19211v1 Announce Type: new Abstract: Current AI agent frameworks have made remarkable progress in automating individual tasks, yet all existing systems serve a single user. Human productivity rests on the social and organizational relationships through which people coordinate, negotiate, and delegate. When agents move beyond performing tasks for one person to representing that person in collaboration with others, the infrastructure for cross-user agent collaboration is entirely absent

ArXiv AI

Research

GRASPrune: Global Gating for Budgeted Structured Pruning of Large Language Models

arXiv:2604.19398v1 Announce Type: new Abstract: Large language models (LLMs) are expensive to serve because model parameters, attention computation, and KV caches impose substantial memory and latency costs. We present GRASPrune, a structured pruning framework applied after pretraining that jointly prunes FFN channels and KV head groups under a single global budget. Instead of learning importance scores without constraints and applying the budget only after training, GRASPrune learns lightweight

ArXiv AI

AI Agents

Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents

arXiv:2604.19457v1 Announce Type: new Abstract: Long-horizon enterprise agents make high-stakes decisions (loan underwriting, claims adjudication, clinical review, prior authorization) under lossy memory, multi-step reasoning, and binding regulatory constraints. Current evaluation reports a single task-success scalar that conflates distinct failure modes and hides whether an agent is aligned with the standards its deployment environment requires. We propose that long-horizon decision behavior de

ArXiv AI

Research

SimDiff: Depth Pruning via Similarity and Difference

arXiv:2604.19520v1 Announce Type: new Abstract: Depth pruning improves the deployment efficiency of large language models (LLMs) by identifying and removing redundant layers. A widely accepted standard for this identification process is to measure the similarity between layers using cosine distance. However, we find that methods relying solely on this one-dimensional heuristic can exhibit unpredictable performance and even catastrophic collapse across different architectures. To address this iss

ArXiv AI

Policy

Position: No Retroactive Cure for Infringement during Training

arXiv:2604.18649v1 Announce Type: cross Abstract: As generative AI faces intensifying legal challenges, the machine learning community has increasingly relied on post-hoc mitigation -- especially machine unlearning and inference-time guardrails -- to argue for compliance. This paper argues that such post-hoc mitigation methods cannot retroactively cure liability from unlawful acquisition and training, because compliance hinges on data lineage, not the outputs. Our argument has three parts. First

ArXiv AI

AI Tools

Unlocking the Edge deployment and ondevice acceleration of multi-LoRA enabled one-for-all foundational LLM

arXiv:2604.18655v1 Announce Type: cross Abstract: Deploying large language models (LLMs) on smartphones poses significant engineering challenges due to stringent constraints on memory, latency, and runtime flexibility. In this work, we present a hardware-aware framework for efficient on-device inference of a LLaMA-based multilingual foundation model supporting multiple use cases on Samsung Galaxy S24 and S25 devices with SM8650 and SM8750 Qualcomm chipsets respectively. Our approach integrates a

ArXiv AI

Research

OmniMouse: Scaling properties of multi-modal, multi-task Brain Models on 150B Neural Tokens

arXiv:2604.18827v1 Announce Type: cross Abstract: Scaling data and artificial neural networks has transformed AI, driving breakthroughs in language and vision. Whether similar principles apply to modeling brain activity remains unclear. Here we leveraged a dataset of 3.1 million neurons from the visual cortex of 73 mice across 323 sessions, totaling more than 150 billion neural tokens recorded during natural movies, images and parametric stimuli, and behavior. We train multi-modal, multi-task mo

ArXiv AI

Policy

Regulating Artificial Intimacy: From Locks and Blocks to Relational Accountability

arXiv:2604.18893v1 Announce Type: cross Abstract: A series of high-profile tragedies involving companion chatbots has triggered an unusually rapid regulatory response. Several jurisdictions, including Australia, California, and New York, have introduced enforceable regulation, while regulators elsewhere have signaled growing concern about risks posed by companion chatbots, particularly to children. In parallel, leading providers, notably OpenAI, appear to have strengthened their self-regulatory

ArXiv AI

Funding

Exclusive: Google deepens Thinking Machines Lab ties with new multibillion-dollar deal

Mira Murati's Thinking Machines Lab has signed a multibillion-dollar deal with Google Cloud for AI infrastructure powered by Nvidia's latest GB300 chips, TechCrunch has exclusively learned.

TechCrunch AI

Funding

SpaceX cuts a deal to maybe buy Cursor for $60 billion

With an IPO looming for Elon Musk's SpaceX / xAI / X combo platter of companies, SpaceX has announced an odd arrangement to either acquire the automated programming platform Cursor for $60 billion or pay a fee of $10 billion. Buying this startup that's focused on AI coding could help xAI's tools compete with market […]

The Verge AI

Trend

5 AI Models Tried to Scam Me. Some of Them Were Scary Good

The cyber capabilities of AI models have experts rattled. AI’s social skills may be just as dangerous.

Wired AI

Trend

Join Our Livestream: Musk v. Altman and the Future of OpenAI

On May 8, we’re going live to answer your questions about the Musk v. Altman trial that could determine the fate of OpenAI.

Wired AI

Product Launch

OpenAI Beefs Up ChatGPT’s Image Generation Model

The ChatGPT Images 2.0 model is here. Our testing shows it’s better at creating more detailed images and rendering text, but it still struggles with languages other than English.

Wired AI

Research

Detecting Data Contamination in Large Language Models

arXiv:2604.19561v1 Announce Type: new Abstract: Large Language Models (LLMs) utilize large amounts of data for their training, some of which may come from copyrighted sources. Membership Inference Attacks (MIA) aim to detect those documents and whether they have been included in the training corpora of the LLMs. The black-box MIAs require a significant amount of data manipulation; therefore, their comparison is often challenging. We study state-of-the-art (SOTA) MIAs under the black-box assumpti

ArXiv AI

Research

Multi-modal Reasoning with LLMs for Visual Semantic Arithmetic

arXiv:2604.19567v1 Announce Type: new Abstract: Reinforcement learning (RL) as post-training is crucial for enhancing the reasoning ability of large language models (LLMs) in coding and math. However, their capacity for visual semantic arithmetic, inferring relationships from images, remains underexplored. The classic text analogy "king"-"man"+"woman" = "queen" illustrates relational reasoning, yet replacing text with images of "king" and "man" significantly reduces performance because it requir

ArXiv AI

Research

Time Series Augmented Generation for Financial Applications

arXiv:2604.19633v1 Announce Type: new Abstract: Evaluating the reasoning capabilities of Large Language Models (LLMs) for complex, quantitative financial tasks is a critical and unsolved challenge. Standard benchmarks often fail to isolate an agent's core ability to parse queries and orchestrate computations. To address this, we introduce a novel evaluation methodology and benchmark designed to rigorously measure an LLM agent's reasoning for financial time-series analysis. We apply this methodol

ArXiv AI

Research

SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models

arXiv:2604.19638v1 Announce Type: new Abstract: Multimodal Large Language Models are increasingly adopted as autonomous agents in interactive environments, yet their ability to proactively address safety hazards remains insufficient. We introduce SafetyALFRED, built upon the embodied agent benchmark ALFRED, augmented with six categories of real-world kitchen hazards. While existing safety evaluations focus on hazard recognition through disembodied question answering (QA) settings, we evaluate el

ArXiv AI

Research

Compile to Compress: Boosting Formal Theorem Provers by Compiler Outputs

arXiv:2604.18587v1 Announce Type: cross Abstract: Large language models (LLMs) have demonstrated significant potential in formal theorem proving, yet state-of-the-art performance often necessitates prohibitive test-time compute via massive roll-outs or extended context windows. In this work, we address this scalability bottleneck by exploiting an informative structure in formal verification: the observation that compilers map a vast space of diverse proof attempts to a compact set of structured

ArXiv AI

AI Tools

CentaurTA Studio: A Self-Improving Human-Agent Collaboration System for Thematic Analysis

arXiv:2604.18589v1 Announce Type: cross Abstract: Thematic analysis is difficult to scale: manual workflows are labor-intensive, while fully automated pipelines often lack controllability and transparent evaluation. We present \textbf{CentaurTA Studio}, a web-based system for self-improving human--agent collaboration in open coding and theme construction. The system integrates (1) a two-stage human feedback pipeline separating simulator drafting and expert validation, (2) persistent prompt optim

ArXiv AI

AI Tools

SPRITE: From Static Mockups to Engine-Ready Game UI

arXiv:2604.18591v1 Announce Type: cross Abstract: Game UI implementation requires translating stylized mockups into interactive engine entities. However, current "Screenshot-to-Code" tools often struggle with the irregular geometries and deep visual hierarchies typical of game interfaces. To bridge this gap, we introduce SPRITE, a pipeline that transforms static screenshots into editable engine assets. By integrating Vision-Language Models (VLMs) with a structured YAML intermediate representatio

ArXiv AI

Research

Two-dimensional early exit optimisation of LLM inference

arXiv:2604.18592v1 Announce Type: cross Abstract: We introduce a two-dimensional (2D) early exit strategy that coordinates layer-wise and sentence-wise exiting for classification tasks in large language models. By processing input incrementally sentence-by-sentence while progressively activating deeper layers, our method achieves multiplicative computational savings that exceed those from optimizing either dimension independently. Experimental evaluation across four state-of-the-art LLMs (Llama

ArXiv AI

Research

TurboEvolve: Towards Fast and Robust LLM-Driven Program Evolution

arXiv:2604.18607v1 Announce Type: cross Abstract: LLM-driven program evolution can discover high-quality programs, but its cost and run-to-run variance hinder reliable progress. We propose TurboEvolve, a multi-island evolutionary framework that improves sample efficiency and robustness under fixed evaluation budgets. Inspired by the multiple-offspring strategy in evolutionary algorithms, TurboEvolve introduces verbalized Sampling, prompting the LLM to emit K diverse candidates with explicit self

ArXiv AI

Research

SpikeMLLM: Spike-based Multimodal Large Language Models via Modality-Specific Temporal Scales and Temporal Compression

arXiv:2604.18610v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) have achieved remarkable progress but incur substantial computational overhead and energy consumption during inference, limiting deployment in resource-constrained environments. Spiking Neural Networks (SNNs), with their sparse event-driven computation, offer inherent energy efficiency advantages on neuromorphic hardware, yet extending them to MLLMs faces two key challenges: heterogeneous modalities make u

ArXiv AI

AI Agents

Agent-GWO: Collaborative Agents for Dynamic Prompt Optimization in Large Language Models

arXiv:2604.18612v1 Announce Type: cross Abstract: Large Language Models (LLMs) have demonstrated strong capabilities in complex reasoning tasks, while recent prompting strategies such as Chain-of-Thought (CoT) have further elevated their performance in handling complex logical problems. Despite these advances, high-quality reasoning remains heavily reliant on manual static prompts and is sensitive to decoding configurations and task distributions, leading to performance fluctuations and limited

ArXiv AI

AI Agents

ARGUS: Agentic GPU Optimization Guided by Data-Flow Invariants

arXiv:2604.18616v1 Announce Type: cross Abstract: LLM-based coding agents can generate functionally correct GPU kernels, yet their performance remains far below hand-optimized libraries on critical computations such as matrix multiplication, attention, and Mixture-of-Experts (MoE). Peak GPU performance requires coordinated reasoning over tightly coupled optimizations, including tiling, shared-memory staging, software pipelining, and instruction scheduling, while existing agents rely on sparse pa

ArXiv AI

AI Agents

Temporal UI State Inconsistency in Desktop GUI Agents: Formalizing and Defending Against TOCTOU Attacks on Computer-Use Agents

arXiv:2604.18860v1 Announce Type: cross Abstract: GUI agents that control desktop computers via screenshot-and-click loops introduce a new class of vulnerability: the observation-to-action gap (mean 6.51 s on real OSWorld workloads) creates a Time-Of-Check, Time-Of-Use (TOCTOU) window during which an unprivileged attacker can manipulate the UI state. We formalize this as a Visual Atomicity Violation and characterize three concrete attack primitives: (A) Notification Overlay Hijack, (B) Window Fo

ArXiv AI

AI Tools

Choose Your Own Adventure: Non-Linear AI-Assisted Programming with EvoGraph

arXiv:2604.18883v1 Announce Type: cross Abstract: Current AI-assisted programming tools are predominantly linear and chat-based, which deviates from the iterative and branching nature of programming itself. Our preliminary study with developers using AI assistants suggested that they often struggle to explore alternatives, manage prompting sequences, and trace changes. Informed by these insights, we created EvoGraph, an IDE plugin that integrates AI interactions and code changes as a lightweight

ArXiv AI

Research

Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest

arXiv:2604.18955v1 Announce Type: cross Abstract: In this study, we present the first comprehensive evaluation of modern LLMs - including GPT-4, GPT-4o, GPT-3.5-Turbo, Gemini 1.5 Pro, DeepSeek-V3, Llama 3.2, and BERT - across three core social media analytics tasks on a Twitter (X) dataset: (I) Social Media Authorship Verification, (II) Social Media Post Generation, and (III) User Attribute Inference. For the authorship verification, we introduce a systematic sampling framework over diverse user

ArXiv AI

Research

Distillation Traps and Guards: A Calibration Knob for LLM Distillability

arXiv:2604.18963v1 Announce Type: cross Abstract: Knowledge distillation (KD) transfers capabilities from large language models (LLMs) to smaller students, yet it can fail unpredictably and also underpins model leakage risks. Our analysis revealed several distillation traps: tail noise, off-policy instability, and, most fundamentally, the teacher-student gap, that distort training signals. These traps manifest as overconfident hallucinations, self-correction collapse, and local decoding degradat

ArXiv AI

Research

AutoAWG: Adverse Weather Generation with Adaptive Multi-Controls for Automotive Videos

arXiv:2604.18993v1 Announce Type: cross Abstract: Perception robustness under adverse weather remains a critical challenge for autonomous driving, with the core bottleneck being the scarcity of real-world video data in adverse weather. Existing weather generation approaches struggle to balance visual quality and annotation reusability. We present AutoAWG, a controllable Adverse Weather video Generation framework for Autonomous driving. Our method employs a semantics-guided adaptive fusion of mul

ArXiv AI

Research

$R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction

arXiv:2604.18995v1 Announce Type: cross Abstract: Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to autoregressive generation by enabling parallel token prediction. However, practical dLLM decoding still suffers from high inference latency, which limits deployment. In this work, we observe that a substantial part of this inefficiency comes from recurring redundancy in the decoding process, including spatial redundancy caused by confidence clusters and positional

ArXiv AI

Research

Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control

arXiv:2604.19018v1 Announce Type: cross Abstract: Inference-time LLM alignment methods, particularly activation steering, offer an alternative to fine-tuning by directly modifying activations during generation. Existing methods, however, often rely on non-anticipative interventions that ignore how perturbations propagate through transformer layers and lack online error feedback, resulting in suboptimal, open-loop control. To address this, we show empirically that, despite the nonlinear structure

ArXiv AI

Research

RARE: Redundancy-Aware Retrieval Evaluation Framework for High-Similarity Corpora

arXiv:2604.19047v1 Announce Type: cross Abstract: Existing QA benchmarks typically assume distinct documents with minimal overlap, yet real-world retrieval-augmented generation (RAG) systems operate on corpora such as financial reports, legal codes, and patents, where information is highly redundant and documents exhibit strong inter-document similarity. This mismatch undermines evaluation validity: retrievers can be unfairly undervalued even when they retrieve documents that provide sufficient

ArXiv AI

Research

SAMoRA: Semantic-Aware Mixture of LoRA Experts for Task-Adaptive Learning

arXiv:2604.19048v1 Announce Type: cross Abstract: The combination of Mixture-of-Experts (MoE) and Low-Rank Adaptation (LoRA) has shown significant potential for enhancing the multi-task learning capabilities of Large Language Models. However, existing methods face two primary challenges: (1)Imprecise Routing in the current MoE-LoRA method fails to explicitly match input semantics with expert capabilities, leading to weak expert specialization. (2)Uniform weight fusion strategies struggle to prov

ArXiv AI

Research

Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization

arXiv:2604.19079v1 Announce Type: cross Abstract: Unification of automatic speech recognition (ASR) systems reduces development and maintenance costs, but training a single model to perform well in both offline and low-latency streaming settings remains challenging. We present a Unified ASR framework for Transducer (RNNT) training that supports both offline and streaming decoding within a single model, using chunk-limited attention with right context and dynamic chunked convolutions. To further

ArXiv AI

Research

ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety

arXiv:2604.19083v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) have achieved remarkable success in cross-modal understanding and generation, yet their deployment is threatened by critical safety vulnerabilities. While prior works have demonstrated the feasibility of backdoors in MLLMs via fine-tuning data poisoning to manipulate inference, the underlying mechanisms of backdoor attacks remain opaque, complicating the understanding and mitigation. To bridge this gap, we

ArXiv AI

Research

RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation

arXiv:2604.19092v1 Announce Type: cross Abstract: Recent advances in large-scale video world models have enabled increasingly realistic future prediction, raising the prospect of leveraging imagined videos for robot learning. However, visual realism does not imply physical plausibility, and behaviors inferred from generated videos may violate dynamics and fail when executed by embodied agents. Existing benchmarks begin to incorporate notions of physical plausibility, but they largely remain perc

ArXiv AI

Research

Design Rules for Extreme-Edge Scientific Computing on AI Engines

arXiv:2604.19106v1 Announce Type: cross Abstract: Extreme-edge scientific applications use machine learning models to analyze sensor data and make real-time decisions. Their stringent latency and throughput requirements demand small batch sizes and require that model weights remain fully on-chip. Spatial dataflow implementations are common for extreme-edge applications. Spatial dataflow works well for small networks, but it fails to scale to larger models due to inherent resource scaling limitat

ArXiv AI

Research

DP-FlogTinyLLM: Differentially private federated log anomaly detection using Tiny LLMs

arXiv:2604.19118v1 Announce Type: cross Abstract: Modern distributed systems generate massive volumes of log data that are critical for detecting anomalies and cyber threats. However, in real world settings, these logs are often distributed across multiple organizations and cannot be centralized due to privacy and security constraints. Existing log anomaly detection methods, including recent large language model (LLM) based approaches, largely rely on centralized training and are not suitable fo

ArXiv AI

Research

The Rise of Verbal Tics in Large Language Models: A Systematic Analysis Across Frontier Models

arXiv:2604.19139v1 Announce Type: cross Abstract: As Large Language Models (LLMs) continue to evolve through alignment techniques such as Reinforcement Learning from Human Feedback (RLHF) and Constitutional AI, a growing and increasingly conspicuous phenomenon has emerged: the proliferation of verbal tics -- repetitive, formulaic linguistic patterns that pervade model outputs. These range from sycophantic openers ("That's a great question!", "Awesome!") to pseudo-empathetic affirmations ("I comp

ArXiv AI

Research

ST-Prune: Training-Free Spatio-Temporal Token Pruning for Vision-Language Models in Autonomous Driving

arXiv:2604.19145v1 Announce Type: cross Abstract: Vision-Language Models (VLMs) have become central to autonomous driving systems, yet their deployment is severely bottlenecked by the massive computational overhead of multi-view camera and multi-frame video input. Existing token pruning methods, primarily designed for single-image inputs, treat each frame or view in isolation and thus fail to exploit the inherent spatio-temporal redundancies in driving scenarios. To bridge this gap, we propose S

ArXiv AI

Research

Inductive Subgraphs as Shortcuts: Causal Disentanglement for Heterophilic Graph Learning

arXiv:2604.19186v1 Announce Type: cross Abstract: Heterophily is a prevalent property of real-world graphs and is well known to impair the performance of homophilic Graph Neural Networks (GNNs). Prior work has attempted to adapt GNNs to heterophilic graphs through non-local neighbor extension or architecture refinement. However, the fundamental reasons behind misclassifications remain poorly understood. In this work, we take a novel perspective by examining recurring inductive subgraphs, empiric

ArXiv AI

Research

Improved Anomaly Detection in Medical Images via Mean Shift Density Enhancement

arXiv:2604.19191v1 Announce Type: cross Abstract: Anomaly detection in medical imaging is essential for identifying rare pathological conditions, particularly when annotated abnormal samples are limited. We propose a hybrid anomaly detection framework that integrates self-supervised representation learning with manifold-based density estimation, a combination that remains largely unexplored in this domain. Medical images are first embedded into a latent feature space using pretrained, potentiall

ArXiv AI

Funding

SpaceX is working with Cursor and has an option to buy the startup for $60B

The move could shore up weaknesses at each company, but it also reveals them. Neither Cursor nor xAI has proprietary models that can match the leading offerings from Anthropic and OpenAI — the same companies now competing directly with Cursor for the developer market.

TechCrunch AI

Funding

AI research lab NeoCognition lands $40M seed to build agents that learn like humans

Founded by an OSU researcher, the startup is developing AI agents that can become experts in any domain.

TechCrunch AI

Product Launch

ChatGPT’s new Images 2.0 model is surprisingly good at generating text

ChatGPT Images 2.0, the newest image-generation model from OpenAI, shows just how much AI capabilities have evolved over the last few years.

TechCrunch AI

Product Launch

Hands on with X’s new AI-powered custom feeds

X's AI-powered custom timelines are replacing Communities, with Grok-curated feeds...and new ad slots.

TechCrunch AI

Product Launch

Google makes an interesting choice with its new agent-building tool for enterprises

Gemini Enterprise Agent Platform takes an interesting approach: It is geared for IT and technical users.

TechCrunch AI

Product Launch

AI Overviews are coming to your Gmail at work

The AI Overviews will offer instant summaries pulled from across multiple emails.

TechCrunch AI

Industry

OpenAI teams up with Infosys to bring AI tools to more businesses

Infosys said the integration will be used to help its clients modernize software development, automate workflows, and deploy AI systems, initially focusing software engineering, legacy modernization, and DevOps.

TechCrunch AI

Product Launch

Google Maps is about to get a big dose of AI

The new features, announced at Cloud Next in Las Vegas this week, add generative AI capabilities to Google's mapping platform, giving it enhanced visual and data analytics powers.

TechCrunch AI

AI Tools

Google Meet will take AI notes for in-person meetings too

Google's AI meeting notetaker is no longer limited to Google Meets - Gemini can also generate summaries and transcripts of in-person meetings now, as well as meetings on Zoom and Microsoft Teams, as first reported by 9to5Google. Support for in-person meetings was previously limited to alpha users and only available on Android. Google's support page […]

The Verge AI

Product Launch

OpenAI’s updated image generator can now pull information from the web

OpenAI is rolling out the latest version of its AI-powered image generator with new "thinking capabilities," allowing it to search the web to help it create multiple images from a single prompt. On Tuesday, OpenAI announced that ChatGPT Images 2.0 can now create more "sophisticated" images, with improvements to its ability to follow instructions, preserve […]

The Verge AI

Product Launch

Making ChatGPT better for clinicians

OpenAI makes ChatGPT for Clinicians free for verified U.S. physicians, nurse practitioners, and pharmacists, supporting clinical care, documentation, and research.

OpenAI Blog

AI Agents

Workspace agents

Learn how to build, use, and scale workspace agents in ChatGPT to automate repeatable workflows, connect tools, and streamline team operations.

OpenAI Blog

Product Launch

Introducing ChatGPT Images 2.0

ChatGPT Images 2.0 introduces a state-of-the-art image generation model with improved text rendering, multilingual support, and advanced visual reasoning.

OpenAI Blog

Trend

AI needs a strong data fabric to deliver business value

Artificial intelligence is moving quickly in the enterprise, from experimentation to everyday use. Organizations are deploying copilots, agents, and predictive systems across finance, supply chains, human resources, and customer operations. By the end of 2025, half of companies used AI in at least three business functions, according to a recent survey. But as AI becomes…

MIT Technology Review

AI Tools

3 new ways Ads Advisor is making Google Ads safer and faster

Three new agentic safety and policy features integrated into Ads Advisor will help protect and streamline your Google Ads account.

Google AI Blog

AI Tools

Mozilla Used Anthropic’s Mythos to Find and Fix 271 Bugs in Firefox

The Firefox team doesn’t think emerging AI capabilities will upend cybersecurity long term, but they warn that software developers are likely in for a rocky transition.

Wired AI

Research

Beyond One Output: Visualizing and Comparing Distributions of Language Model Generations

arXiv:2604.18724v1 Announce Type: new Abstract: Users typically interact with and evaluate language models via single outputs, but each output is just one sample from a broad distribution of possible completions. This interaction hides distributional structure such as modes, uncommon edge cases, and sensitivity to small prompt changes, leading users to over-generalize from anecdotes when iterating on prompts for open-ended tasks. Informed by a formative study with researchers who use LMs (n=13)

ArXiv AI

Research

AI scientists produce results without reasoning scientifically

arXiv:2604.18805v1 Announce Type: new Abstract: Large language model (LLM)-based systems are increasingly deployed to conduct scientific research autonomously, yet whether their reasoning adheres to the epistemic norms that make scientific inquiry self-correcting is poorly understood. Here, we evaluate LLM-based scientific agents across eight domains, spanning workflow execution to hypothesis-driven inquiry, through more than 25,000 agent runs and two complementary lenses: (i) a systematic perfo

ArXiv AI

Research

From Natural Language to Executable Narsese: A Neuro-Symbolic Benchmark and Pipeline for Reasoning with NARS

arXiv:2604.18873v1 Announce Type: new Abstract: Large language models (LLMs) are highly capable at language generation, but they remain unreliable when reasoning requires explicit symbolic structure, multi-step inference, and interpretable uncertainty. This paper presents a neuro-symbolic framework for translating natural-language reasoning problems into executable formal representations using first-order logic (FOL) and Narsese, the language of the Non-Axiomatic Reasoning System (NARS). To supp

ArXiv AI

AI Tools

AutomationBench

arXiv:2604.18934v1 Announce Type: new Abstract: Existing AI benchmarks for software automation rarely combine cross-application coordination, autonomous API discovery, and policy adherence. Real business workflows demand all three: a single task may span a CRM, inbox, calendar, and messaging platform - requiring the agent to find the right endpoints, follow a policy document, and write correct data to each system. To address this gap, we introduce AutomationBench, a benchmark for evaluating AI a

ArXiv AI

Research

Personalized Benchmarking: Evaluating LLMs by Individual Preferences

arXiv:2604.18943v1 Announce Type: new Abstract: With the rise in capabilities of large language models (LLMs) and their deployment in real-world tasks, evaluating LLM alignment with human preferences has become an important challenge. Current benchmarks average preferences across all users to compute aggregate ratings, overlooking individual user preferences when establishing model rankings. Since users have varying preferences in different contexts, we call for personalized LLM benchmarks that

ArXiv AI

Research

DW-Bench: Benchmarking LLMs on Data Warehouse Graph Topology Reasoning

arXiv:2604.18964v1 Announce Type: new Abstract: This paper introduces DW-Bench, a new benchmark that evaluates large language models (LLMs) on graph-topology reasoning over data warehouse schemas, explicitly integrating both foreign-key (FK) and data-lineage edges. The benchmark comprises 1,046 automatically generated, verifiably correct questions across five schemas. Experiments show that tool-augmented methods substantially outperform static approaches but plateau on hard compositional subtype

ArXiv AI

Research

Reinforcement Learning Improves LLM Accuracy and Reasoning in Disease Classification from Radiology Reports

arXiv:2604.19060v1 Announce Type: new Abstract: Accurate disease classification from radiology reports is essential for many applications. While supervised fine-tuning (SFT) of lightweight LLMs improves accuracy, it can degrade reasoning. We propose a two-stage approach: SFT on disease labels followed by Group Relative Policy Optimization (GRPO) to refine predictions by optimizing accuracy and format without reasoning supervision. Across three radiologist-annotated datasets, SFT outperformed bas

ArXiv AI

Research

OLLM: Options-based Large Language Models

arXiv:2604.19087v1 Announce Type: new Abstract: We introduce Options LLM (OLLM), a simple, general method that replaces the single next-token prediction of standard LLMs with a \textit{set of learned options} for the next token, indexed by a discrete latent variable. Instead of relying on temperature or sampling heuristics to induce diversity, OLLM models variation explicitly: a small latent space parametrizes multiple plausible next-token options which can be selected or searched by a downstrea

ArXiv AI

Research

Towards Scalable Lifelong Knowledge Editing with Selective Knowledge Suppression

arXiv:2604.19089v1 Announce Type: new Abstract: Large language models (LLMs) require frequent knowledge updates to reflect changing facts and mitigate hallucinations. To meet this demand, lifelong knowledge editing has emerged as a continual approach to modify specific pieces of knowledge without retraining the entire model. Existing parameter editing methods struggle with stability during sequential edits due to catastrophic forgetting. While retrieval-based approaches are proposed to alleviate

ArXiv AI

Research

UAF: A Unified Audio Front-end LLM for Full-Duplex Speech Interaction

arXiv:2604.19221v1 Announce Type: new Abstract: Full-duplex speech interaction, as the most natural and intuitive mode of human communication, is driving artificial intelligence toward more human-like conversational systems. Traditional cascaded speech processing pipelines suffer from critical limitations, including accumulated latency, information loss, and error propagation across modules. To address these issues, recent efforts focus on the end-to-end audio large language models (LLMs) like G

ArXiv AI

AI Agents

Explicit Trait Inference for Multi-Agent Coordination

arXiv:2604.19278v1 Announce Type: new Abstract: LLM-based multi-agent systems (MAS) show promise on complex tasks but remain prone to coordination failures such as goal drift, error cascades, and misaligned behaviors. We propose Explicit Trait Inference (ETI), a psychologically grounded method for improving coordination. ETI enables agents to infer and track partner characteristics along two established psychological dimensions--warmth (e.g., trust) and competence (e.g., skill)--from interaction

ArXiv AI

Research

Large Language Models Exhibit Normative Conformity

arXiv:2604.19301v1 Announce Type: new Abstract: The conformity bias exhibited by large language models (LLMs) can pose a significant challenge to decision-making in LLM-based multi-agent systems (LLM-MAS). While many prior studies have treated "conformity" simply as a matter of opinion change, this study introduces the social psychological distinction between informational conformity and normative conformity in order to understand LLM conformity at the mechanism level. Specifically, we design ne

ArXiv AI

AI Agents

Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture The Flag Challenges

arXiv:2604.19354v1 Announce Type: new Abstract: Large Language Model (LLM) agents are increasingly proposed for autonomous cybersecurity tasks, but their capabilities in realistic offensive settings remain poorly understood. We present DeepRed, an open-source benchmark for evaluating LLM-based agents on realistic Capture The Flag (CTF) challenges in isolated virtualized environments. DeepRed places an agent in a Kali attacker environment with terminal tools and optional web search, connected ove

ArXiv AI

Research

CoDA: Towards Effective Cross-domain Knowledge Transfer via CoT-guided Domain Adaptation

arXiv:2604.19488v1 Announce Type: new Abstract: Large language models (LLMs) have achieved substantial advances in logical reasoning, yet they continue to lag behind human-level performance. In-context learning provides a viable solution that boosts the model's performance via prompting its input with expert-curated, in-domain exemplars. However, in many real-world, expertise-scarce domains, such as low-resource scientific disciplines, emerging biomedical subfields, or niche legal jurisdictions,

ArXiv AI

Research

From Experience to Skill: Multi-Agent Generative Engine Optimization via Reusable Strategy Learning

arXiv:2604.19516v1 Announce Type: new Abstract: Generative engines (GEs) are reshaping information access by replacing ranked links with citation-grounded answers, yet current Generative Engine Optimization (GEO) methods optimize each instance in isolation, unable to accumulate or transfer effective strategies across tasks and engines. We reframe GEO as a strategy learning problem and propose MAGEO, a multi-agent framework in which coordinated planning, editing, and fidelity-aware evaluation ser

ArXiv AI

AI Agents

Revac: A Social Deduction Reasoning Agent

arXiv:2604.19523v1 Announce Type: new Abstract: Social deduction games such as Mafia present a unique AI challenge: players must reason under uncertainty, interpret incomplete and intentionally misleading information, evaluate human-like communication, and make strategic elimination decisions. Unlike deterministic board games, success in Mafia depends not on perfect information or brute-force search, but on inference, memory, and adaptability in the presence of deception. This work presents the

ArXiv AI

AI Agents

Integrating Anomaly Detection into Agentic AI for Proactive Risk Management in Human Activity

arXiv:2604.19538v1 Announce Type: new Abstract: Agentic AI, with goal-directed, proactive, and autonomous decision-making capabilities, offers a compelling opportunity to address movement-related risks in human activity, including the persistent hazard of falls among elderly populations. Despite numerous approaches to fall mitigation through fall prediction and detection, existing systems have not yet functioned as universal solutions across care pathways and safety-critical environments. This i

ArXiv AI

Research

DT2IT-MRM: Debiased Preference Construction and Iterative Training for Multimodal Reward Modeling

arXiv:2604.19544v1 Announce Type: new Abstract: Multimodal reward models (MRMs) play a crucial role in aligning Multimodal Large Language Models (MLLMs) with human preferences. Training a good MRM requires high-quality multimodal preference data. However, existing preference datasets face three key challenges: lack of granularity in preference strength, textual style bias, and unreliable preference signals. Besides, existing open-source multimodal preference datasets suffer from substantial nois

ArXiv AI

Research

Easy Samples Are All You Need: Self-Evolving LLMs via Data-Efficient Reinforcement Learning

arXiv:2604.18639v1 Announce Type: cross Abstract: Previous LLMs-based RL studies typically follow either supervised learning with high annotation costs, or unsupervised paradigms using voting or entropy-based rewards. However, their performance remains far from satisfactory due to the substantial annotation cost and issues such as model collapse or reward hacking. To address these issues, we introduce a new perspective inspired by cognitive learning theory and propose a novel approach called Eas

ArXiv AI

AI Agents

From Craft to Kernel: A Governance-First Execution Architecture and Semantic ISA for Agentic Computers

arXiv:2604.18652v1 Announce Type: cross Abstract: The transition of agentic AI from brittle prototypes to production systems is stalled by a pervasive crisis of craft. We suggest that the prevailing orchestration paradigm-delegating the system control loop to large language models and merely patching with heuristic guardrails-is the root cause of this fragility. Instead, we propose Arbiter-K, a Governance-First execution architecture that reconceptualizes the underlying model as a Probabilistic

ArXiv AI

AI Agents

Owner-Harm: A Missing Threat Model for AI Agent Safety

arXiv:2604.18658v1 Announce Type: cross Abstract: Existing AI agent safety benchmarks focus on generic criminal harm (cybercrime, harassment, weapon synthesis), leaving a systematic blind spot for a distinct and commercially consequential threat category: agents harming their own deployers. Real-world incidents illustrate the gap: Slack AI credential exfiltration (Aug 2024), Microsoft 365 Copilot calendar-injection leaks (Jan 2024), and a Meta agent unauthorized forum post exposing operational d

ArXiv AI

Research

Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation

arXiv:2604.18663v1 Announce Type: cross Abstract: Existing jamming attacks on Retrieval-Augmented Generation (RAG) systems typically induce explicit refusals or denial-of-service behaviors, which are conspicuous and easy to detect. In this work, we formalize a subtler availability threat, termed soft failure, which degrades system utility by inducing fluent and coherent yet non-informative responses rather than overt failures. We propose Deceptive Evolutionary Jamming Attack (DEJA), an automated

ArXiv AI

AI Agents

Towards Optimal Agentic Architectures for Offensive Security Tasks

arXiv:2604.18718v1 Announce Type: cross Abstract: Agentic security systems increasingly audit live targets with tool-using LLMs, but prior systems fix a single coordination topology, leaving unclear when additional agents help and when they only add cost. We treat topology choice as an empirical systems question. We introduce a controlled benchmark of 20 interactive targets (10 web/API and 10 binary), each exposing one endpoint-reachable ground-truth vulnerability, evaluated in whitebox and blac

ArXiv AI

Research

Towards Understanding the Robustness of Sparse Autoencoders

arXiv:2604.18756v1 Announce Type: cross Abstract: Large Language Models (LLMs) remain vulnerable to optimization-based jailbreak attacks that exploit internal gradient structure. While Sparse Autoencoders (SAEs) are widely used for interpretability, their robustness implications remain underexplored. We present a study of integrating pretrained SAEs into transformer residual streams at inference time, without modifying model weights or blocking gradients. Across four model families (Gemma, LLaMA

ArXiv AI

Research

HELM: Harness-Enhanced Long-horizon Memory for Vision-Language-Action Manipulation

arXiv:2604.18791v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models fail systematically on long-horizon manipulation tasks despite strong short-horizon performance. We show that this failure is not resolved by extending context length alone in the current reactive execution setting; instead, it stems from three recurring execution-loop deficiencies: the memory gap, the verification gap, and the recovery gap. We present HELM, a model-agnostic framework that addresses these defic

ArXiv AI

Research

Geometric Decoupling: Diagnosing the Structural Instability of Latent

arXiv:2604.18804v1 Announce Type: cross Abstract: Latent Diffusion Models (LDMs) achieve high-fidelity synthesis but suffer from latent space brittleness, causing discontinuous semantic jumps during editing. We introduce a Riemannian framework to diagnose this instability by analyzing the generative Jacobian, decomposing geometry into \textit{Local Scaling} (capacity) and \textit{Local Complexity} (curvature). Our study uncovers a \textbf{``Geometric Decoupling"}: while curvature in normal gener

ArXiv AI

Research

One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models

arXiv:2604.18839v1 Announce Type: cross Abstract: Looped transformers scale computational depth without increasing parameter count by repeatedly applying a shared transformer block and can be used for iterative refinement, where each loop rewrites a full fixed-size prediction in parallel. On difficult problems, such as those that require search-like computation, reaching a highly structured solution starting from noise can require long refinement trajectories. Learning such trajectories is chall

ArXiv AI

Research

Hierarchically Robust Zero-shot Vision-language Models

arXiv:2604.18867v1 Announce Type: cross Abstract: Vision-Language Models (VLMs) can perform zero-shot classification but are susceptible to adversarial attacks. While robust fine-tuning improves their robustness, existing approaches align fixed text embeddings with an image embedding, sacrificing natural performance and robustness. A robustness degradation also occurs when a model faces adversarial attacks targeting superclasses (parent classes, e.g., mammal) in addition to their base (leaf) cla

ArXiv AI

Research

Where Fake Citations Are Made: Tracing Field-Level Hallucination to Specific Neurons in LLMs

arXiv:2604.18880v1 Announce Type: cross Abstract: LLMs frequently generate fictitious yet convincing citations, often expressing high confidence even when the underlying reference is wrong. We study this failure across 9 models and 108{,}000 generated references, and find that author names fail far more often than other fields across all models and settings. Citation style has no measurable effect, while reasoning-oriented distillation degrades recall. Probes trained on one field transfer at nea

ArXiv AI

Research

Harmful Intent as a Geometrically Recoverable Feature of LLM Residual Streams

arXiv:2604.18901v1 Announce Type: cross Abstract: Harmful intent is geometrically recoverable from large language model residual streams: as a linear direction in most layers, and as angular deviation in layers where projection methods fail. Across 12 models spanning four architectural families (Qwen2.5, Qwen3.5, Llama-3.2, Gemma-3) and three alignment variants (base, instruction-tuned, abliterated), under single-turn, English evaluation, we characterise this geometry through six direction-findi

ArXiv AI

Research

Gradient-Based Program Synthesis with Neurally Interpreted Languages

arXiv:2604.18907v1 Announce Type: cross Abstract: A central challenge in program induction has long been the trade-off between symbolic and neural approaches. Symbolic methods offer compositional generalisation and data efficiency, yet their scalability is constrained by formalisms such as domain-specific languages (DSLs), which are labour-intensive to create and may not transfer to new domains. In contrast, neural networks flexibly learn from data but tend to generalise poorly in compositional

ArXiv AI

Trend

AI Tools Are Helping Mediocre North Korean Hackers Steal Millions

One group of hackers used AI for everything from vibe coding their malware to creating fake company websites—and stole as much as $12 million in three months.

Wired AI

Trend

This Scammer Used an AI-Generated MAGA Girl to Grift ‘Super Dumb’ Men

A med student says he’s made thousands of dollars selling photos and videos of a young conservative woman he created using generative tools. He’s not alone.

Wired AI

Industry

Now Meta will track what employees do on their computers to train its AI agents

Meta employees' activity at work is now being used to train the company's AI agents. As reported by Reuters, Meta is installing a tool it calls Model Capability Initiative (MCI) on US-based employees' computers that runs in work-related apps and websites, recording mouse movements, clicks, keystrokes, and occasional screenshots. The data from this tool will […]

The Verge AI

Industry

Anthropic’s most dangerous AI model just fell into the wrong hands

Anthropic's Mythos AI model, a powerful cybersecurity tool that the company said could be dangerous in the wrong hands, has been accessed by a "small group of unauthorized users," Bloomberg reports. An unnamed member of the group, identified only as "a third-party contractor for Anthropic," told the publication that members of a private online forum […]

The Verge AI

Open Source

Introducing OpenAI Privacy Filter

OpenAI Privacy Filter is an open-weight model for detecting and redacting personally identifiable information (PII) in text with state-of-the-art accuracy

OpenAI Blog

Research

FASE : A Fairness-Aware Spatiotemporal Event Graph Framework for Predictive Policing

arXiv:2604.18644v1 Announce Type: cross Abstract: Predictive policing systems that allocate patrol resources based solely on predicted crime risk can unintentionally amplify racial disparities through feedback driven data bias. We present FASE, a Fairness Aware Spatiotemporal Event Graph framework, which integrates spatiotemporal crime prediction with fairness constrained patrol allocation and a closed loop deployment feedback simulator. We model Baltimore as a graph of 25 ZIP Code Tabulation Ar

ArXiv AI

Research

REVEAL: Multimodal Vision-Language Alignment of Retinal Morphometry and Clinical Risks for Incident AD and Dementia Prediction

arXiv:2604.18757v1 Announce Type: cross Abstract: The retina provides a unique, noninvasive window into Alzheimer's disease (AD) and dementia, capturing early structural changes through morphometric features, while systemic and lifestyle risk factors reflect well-established contributors to disease susceptibility long before clinical symptom onset. However, current retinal analysis frameworks typically model imaging and risk factors separately, limiting their ability to capture joint multimodal

ArXiv AI

Research

Enhancing Construction Worker Safety in Extreme Heat: A Machine Learning Approach Utilizing Wearable Technology for Predictive Health Analytics

arXiv:2604.19559v1 Announce Type: new Abstract: Construction workers are highly vulnerable to heat stress, yet tools that translate real-time physiological data into actionable safety intelligence remain scarce. This study addresses this gap by developing and evaluating deep learning models, specifically a baseline Long Short-Term Memory (LSTM) network and an attention-based LSTM, to predict heat stress among 19 workers in Saudi Arabia. Using Garmin Vivosmart 5 smartwatches to monitor metrics su

ArXiv AI

Research

AblateCell: A Reproduce-then-Ablate Agent for Virtual Cell Repositories

arXiv:2604.19606v1 Announce Type: new Abstract: Systematic ablations are essential to attribute performance gains in AI Virtual Cells, yet they are rarely performed because biological repositories are under-standardized and tightly coupled to domain-specific data and formats. While recent coding agents can translate ideas into implementations, they typically stop at producing code and lack a verifier that can reproduce strong baselines and rigorously test which components truly matter. We introd

ArXiv AI

Research

A Dual Perspective on Synthetic Trajectory Generators: Utility Framework and Privacy Vulnerabilities

arXiv:2604.19653v1 Announce Type: new Abstract: Human mobility data are used in numerous applications, ranging from public health to urban planning. Human mobility is inherently sensitive, as it can contain information such as religious beliefs and political affiliations. Historically, it has been proposed to modify the information using techniques such as aggregation, obfuscation, or noise addition, to adequately protect privacy and eliminate concerns. As these methods come at a great cost in u

ArXiv AI

Research

A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding

arXiv:2604.19689v1 Announce Type: new Abstract: Understanding artworks requires multi-step reasoning over visual content and cultural, historical, and stylistic context. While recent multimodal large language models show promise in artwork explanation, they rely on implicit reasoning and internalized knowl- edge, limiting interpretability and explicit evidence grounding. We propose A-MAR, an Agent-based Multimodal Art Retrieval framework that explicitly conditions retrieval on structured reasoni

ArXiv AI

Research

Thermal Anomaly Detection using Physics Aware Neuromorphic Networks: Comparison between Raw and L1C Sentinel-2 Data

arXiv:2604.18606v1 Announce Type: cross Abstract: Damage caused by bushfires and volcanic eruptions escalates rapidly when detection is delayed, making fast and reliable early warning capabilities essential. Recent Earth Observation (EO) approaches have shown that thermal anomaly detection can be performed directly on decompressed Level-0 (L0) sensor data, avoiding computationally expensive preprocessing chains. However, direct exploitation of raw data remains challenging due to domain shift, se

ArXiv AI

Research

Neuromorphic Continual Learning for Sequential Deployment of Nuclear Plant Monitoring Systems

arXiv:2604.18611v1 Announce Type: cross Abstract: Anomaly detection in nuclear industrial control systems (ICS) requires continuous, energy-efficient monitoring across multiple subsystems that are often deployed at different stages of plant commissioning. When a conventional neural network is sequentially trained to monitor new subsystems, it catastrophically forgets previously learned anomaly patterns, a safety-critical failure mode. We present the first spiking neural network (SNN)-based anoma

ArXiv AI

Research

LLM-as-Judge Framework for Evaluating Tone-Induced Hallucination in Vision-Language Models

arXiv:2604.18803v1 Announce Type: cross Abstract: Vision-Language Models (VLMs) are increasingly deployed in settings where reliable visual grounding carries operational consequences, yet their behavior under progressively coercive prompt phrasing remains undercharacterized. Existing hallucination benchmarks predominantly rely on neutral prompts and binary detection, leaving open how both the incidence and the intensity of fabrication respond to graded linguistic pressure across structurally dis

ArXiv AI

Research

Semantic Needles in Document Haystacks: Sensitivity Testing of LLM-as-a-Judge Similarity Scoring

arXiv:2604.18835v1 Announce Type: cross Abstract: We propose a scalable, multifactorial experimental framework that systematically probes LLM sensitivity to subtle semantic changes in pairwise document comparison. We analogize this as a needle-in-a-haystack problem: a single semantically altered sentence (the needle) is embedded within surrounding context (the hay), and we vary the perturbation type (negation, conjunction swap, named entity replacement), context type (original vs. topically unre

ArXiv AI

AI Tools

Human-Machine Co-Boosted Bug Report Identification with Mutualistic Neural Active Learning

arXiv:2604.18862v1 Announce Type: cross Abstract: Bug reports, encompassing a wide range of bug types, are crucial for maintaining software quality. However, the increasing complexity and volume of bug reports pose a significant challenge in sole manual identification and assignment to the appropriate teams for resolution, as dealing with all the reports is time-consuming and resource-intensive. In this paper, we introduce a cross-project framework, dubbed Mutualistic Neural Active Learning (MNA

ArXiv AI

Research

Gated Memory Policy

arXiv:2604.18933v1 Announce Type: cross Abstract: Robotic manipulation tasks exhibit varying memory requirements, ranging from Markovian tasks that require no memory to non-Markovian tasks that depend on historical information spanning single or multiple interaction trials. Surprisingly, simply extending observation histories of a visuomotor policy often leads to a significant performance drop due to distribution shift and overfitting. To address these issues, we propose Gated Memory Policy (GMP

ArXiv AI

Research

Self-Improving Tabular Language Models via Iterative Group Alignment

arXiv:2604.18966v1 Announce Type: cross Abstract: While language models have been adapted for tabular data generation, two fundamental limitations remain: (1) static fine-tuning produces models that cannot learn from their own generated samples and adapt to self-correct, and (2) autoregressive objectives preserve local token coherence but neglect global statistical properties, degrading tabular quality. Reinforcement learning offers a potential solution but requires designing reward functions th

ArXiv AI

Research

Low-Rank Adaptation for Critic Learning in Off-Policy Reinforcement Learning

arXiv:2604.18978v1 Announce Type: cross Abstract: Scaling critic capacity is a promising direction for enhancing off-policy reinforcement learning (RL). However, larger critics are prone to overfitting and unstable in replay-buffer-based bootstrap training. This paper leverages Low-Rank Adaptation (LoRA) as a structural-sparsity regularizer for off-policy critics. Our approach freezes randomly initialized base matrices and solely optimizes low-rank adapters, thereby constraining critic updates t

ArXiv AI

Research

Decompose, Structure, and Repair: A Neuro-Symbolic Framework for Autoformalization via Operator Trees

arXiv:2604.19000v1 Announce Type: cross Abstract: Statement autoformalization acts as a critical bridge between human mathematics and formal mathematics by translating natural language problems into formal language. While prior works have focused on data synthesis and diverse training paradigms to optimize end-to-end Large Language Models (LLMs), they typically treat formal code as flat sequences, neglecting the hierarchical logic inherent in mathematical statements. In this work, we introduce D

ArXiv AI

Research

Intentional Updates for Streaming Reinforcement Learning

arXiv:2604.19033v1 Announce Type: cross Abstract: In gradient-based learning, a step size chosen in parameter units does not produce a predictable per-step change in function output. This often leads to instability in the streaming setting (i.e., batch size=1), where stochasticity is not averaged out and update magnitudes can momentarily become arbitrarily big or small. Instead, we propose intentional updates: first specify the intended outcome of an update and then solve for the step size that

ArXiv AI

Research

Product-of-Experts Training Reduces Dataset Artifacts in Natural Language Inference

arXiv:2604.19069v1 Announce Type: cross Abstract: Neural NLI models overfit dataset artifacts instead of truly reasoning. A hypothesis-only model gets 57.7% in SNLI, showing strong spurious correlations, and 38.6% of the baseline errors are the result of these artifacts. We propose Product-of-Experts (PoE) training, which downweights examples where biased models are overconfident. PoE nearly preserves accuracy (89.10% vs. 89.30%) while cutting bias reliance by 4.71% (bias agreement 49.85% to 45%

ArXiv AI

Research

Multi-modal Test-time Adaptation via Adaptive Probabilistic Gaussian Calibration

arXiv:2604.19093v1 Announce Type: cross Abstract: Multi-modal test-time adaptation (TTA) enhances the resilience of benchmark multi-modal models against distribution shifts by leveraging the unlabeled target data during inference. Despite the documented success, the advancement of multi-modal TTA methodologies has been impeded by a persistent limitation, i.e., the lack of explicit modeling of category-conditional distributions, which is crucial for yielding accurate predictions and reliable deci

ArXiv AI

Research

Multi-Gait Learning for Humanoid Robots Using Reinforcement Learning with Selective Adversarial Motion Prior

arXiv:2604.19102v1 Announce Type: cross Abstract: Learning diverse locomotion skills for humanoid robots in a unified reinforcement learning framework remains challenging due to the conflicting requirements of stability and dynamic expressiveness across different gaits. We present a multi-gait learning approach that enables a humanoid robot to master five distinct gaits -- walking, goose-stepping, running, stair climbing, and jumping -- using a consistent policy structure, action space, and rewa

ArXiv AI

Research

Reinforcement Learning Enabled Adaptive Multi-Task Control for Bipedal Soccer Robots

arXiv:2604.19104v1 Announce Type: cross Abstract: Developing bipedal football robots in dynamiccombat environments presents challenges related to motionstability and deep coupling of multiple tasks, as well ascontrol switching issues between different states such as up-right walking and fall recovery. To address these problems,this paper proposes a modular reinforcement learning (RL)framework for achieving adaptive multi-task control. Firstly,this framework combines an open-loop feedforward osci

ArXiv AI

Research

How Do Answer Tokens Read Reasoning Traces? Self-Reading Patterns in Thinking LLMs for Quantitative Reasoning

arXiv:2604.19149v1 Announce Type: cross Abstract: Thinking LLMs produce reasoning traces before answering. Prior activation steering work mainly targets on shaping these traces. It remains less understood how answer tokens actually read and integrate the reasoning to produce reliable outcomes. Focusing on quantitative reasoning, we analyze the answer-to-reasoning attention and observe a benign self-reading pattern aligned with correctness, characterized by a forward drift of the reading focus al

ArXiv AI

Research

SCURank: Ranking Multiple Candidate Summaries with Summary Content Units for Enhanced Summarization

arXiv:2604.19185v1 Announce Type: cross Abstract: Small language models (SLMs), such as BART, can achieve summarization performance comparable to large language models (LLMs) via distillation. However, existing LLM-based ranking strategies for summary candidates suffer from instability, while classical metrics (e.g., ROUGE) are insufficient to rank high-quality summaries. To address these issues, we introduce \textbf{SCURank}, a framework that enhances summarization by leveraging \textbf{Summary

ArXiv AI

Research

Attention-based Multi-modal Deep Learning Model of Spatio-temporal Crop Yield Prediction with Satellite, Soil and Climate Data

arXiv:2604.19217v1 Announce Type: cross Abstract: Crop yield prediction is one of the most important challenge, which is crucial to world food security and policy-making decisions. The conventional forecasting techniques are limited in their accuracy with reference to the fact that they utilize static data sources that do not reflect the dynamic and intricate relationships that exist between the variables of the environment over time [5,13]. This paper presents Attention-Based Multi-Modal Deep L

ArXiv AI

Research

Talking to a Know-It-All GPT or a Second-Guesser Claude? How Repair reveals unreliable Multi-Turn Behavior in LLMs

arXiv:2604.19245v1 Announce Type: cross Abstract: Repair, an important resource for resolving trouble in human-human conversation, remains underexplored in human-LLM interaction. In this study, we investigate how LLMs engage in the interactive process of repair in multi-turn dialogues around solvable and unsolvable math questions. We examine whether models initiate repair themselves and how they respond to user-initiated repair. Our results show strong differences across models: reactions range

ArXiv AI

Trend

Meta will record employees’ keystrokes and use it to train its AI models

Meta says that it has a new internal tool that is converting mouse movements and button clicks into data that can train its AI models.

TechCrunch AI

Policy

AI failure could trigger the next financial crisis, warns Elizabeth Warren

"I know a bubble when I see one." That's what Sen. Elizabeth Warren (D-MA), who led the push to create a new consumer financial regulator in the wake of the 2008 recession, told a crowd at a Vanderbilt Policy Accelerator event in Washington, DC on Wednesday. Warren warned of what she called "striking" parallels to […]

The Verge AI

AI Tools

The Pope’s Warnings About AI Were AI-Generated, a Detection Tool Claims

Pangram Labs’ updated Chrome extension puts warning labels on AI slop as you scroll your social feeds.

Wired AI

Open Source

Gemma 4 VLA Demo on Jetson Orin Nano Super

Hugging Face Blog

Research

QIMMA قِمّة ⛰: A Quality-First Arabic LLM Leaderboard

Hugging Face Blog

Trend

Tesla just increased its capex to $25B. Here’s where the money is going.

Tesla's planned capex for 2026 is three times higher than what the company has historically spent. Its CFO said, as a result, Tesla will have a negative free cash flow the rest of the year.

TechCrunch AI

Funding

AI is spitting out more potential drugs than ever. This startup wants to figure out which ones matter.

10x Science has raised a $4.8 million seed round to help pharmaceutical researchers understand complex molecules.

TechCrunch AI

Trend

The most interesting startups showcased at Google Cloud Next 2026

Google wants AI startups on its cloud and has showcased a long list of them at its annual conference.

TechCrunch AI

Industry

Anthropic’s Mythos rollout has missed America’s cybersecurity agency

Several US federal agencies are taking up Anthropic's new cybersecurity model to find vulnerabilities, but one is reportedly not getting in on the action: the nation's central cybersecurity coordinator. On Tuesday, Axios reported that the Cybersecurity and Infrastructure Security Agency (CISA) didn't have access to Mythos Preview, which Anthropic has touted as a powerful tool […]

The Verge AI

AI Tools

Speeding up agentic workflows with WebSockets in the Responses API

A deep dive into the Codex agent loop, showing how WebSockets and connection-scoped caching reduced API overhead and improved model latency.

OpenAI Blog

AI Tools

Microsoft issues emergency update for macOS and Linux ASP.NET threat

When authentication fails, things can go very, very wrong.

Ars Technica AI

Research

Formally Verified Patent Analysis via Dependent Type Theory: Machine-Checkable Certificates from a Hybrid AI + Lean 4 Pipeline

arXiv:2604.18882v1 Announce Type: new Abstract: We present a formally verified framework for patent analysis as a hybrid AI + Lean 4 pipeline. The DAG-coverage core (Algorithm 1b) is fully machine-verified once bounded match scores are fixed. Freedom-to-operate, claim-construction sensitivity, cross-claim consistency, and doctrine-of-equivalents analyses are formalized at the specification level with kernel-checked candidate certificates. Existing patent-analysis approaches rely on manual expert

ArXiv AI

Research

On Accelerating Grounded Code Development for Research

arXiv:2604.19022v1 Announce Type: new Abstract: A major challenge for niche scientific and technical domains in leveraging coding agents is the lack of access to up-to-date, domain- specific knowledge. Foundational models often demonstrate limited reasoning capabilities in specialized fields and cannot inherently incorporate knowledge that evolves through ongoing research and experimentation. Materials scientists exploring novel compounds, communication engineers designing and evaluating new pro

ArXiv AI

Research

Learning Lifted Action Models from Unsupervised Visual Traces

arXiv:2604.19043v1 Announce Type: new Abstract: Efficient construction of models capturing the preconditions and effects of actions is essential for applying AI planning in real-world domains. Extensive prior work has explored learning such models from high-level descriptions of state and/or action sequences. In this paper, we tackle a more challenging setting: learning lifted action models from sequences of state images, without action observation. We propose a deep learning framework that join

ArXiv AI

Research

Reasoning-Aware AIGC Detection via Alignment and Reinforcement

arXiv:2604.19172v1 Announce Type: new Abstract: The rapid advancement and widespread adoption of Large Language Models (LLMs) have elevated the need for reliable AI-generated content (AIGC) detection, which remains challenging as models evolve. We introduce AIGC-text-bank, a comprehensive multi-domain dataset with diverse LLM sources and authorship scenarios, and propose REVEAL, a detection framework that generates interpretable reasoning chains before classification. Our approach uses a two-sta

ArXiv AI

Research

Industrial Surface Defect Detection via Diffusion Generation and Asymmetric Student-Teacher Network

arXiv:2604.19240v1 Announce Type: new Abstract: Industrial surface defect detection often suffers from limited defect samples, severe long-tailed distributions, and difficulties in accurately localizing subtle defects under complex backgrounds. To address these challenges, this paper proposes an unsupervised defect detection method that integrates a Denoising Diffusion Probabilistic Model (DDPM) with an asymmetric teacher-student architecture. First, at the data level, the DDPM is trained solely

ArXiv AI

Trend

Towards Energy Impact on AI-Powered 6G IoT Networks: Centralized vs. Decentralized

arXiv:2604.19377v1 Announce Type: new Abstract: The emergence of sixth-generation (6G) technologies has introduced new challenges and opportunities for machine learning (ML) applications in Internet of Things (IoT) networks, particularly concerning energy efficiency. As model training and data transmission contribute significantly to energy consumption, optimizing these processes has become critical for sustainable system design. This study first conduct analysis on the energy consumption model

ArXiv AI

Research

Do LLMs Game Formalization? Evaluating Faithfulness in Logical Reasoning

arXiv:2604.19459v1 Announce Type: new Abstract: Formal verification guarantees proof validity but not formalization faithfulness. For natural-language logical reasoning, where models construct axiom systems from scratch without library constraints, this gap between valid proofs and faithful translations is especially acute. We investigate whether frontier models exploit this gap when generating Lean 4 proofs, a behavior we term formalization gaming. We evaluate GPT-5 and DeepSeek-R1 on 303 first

ArXiv AI

Research

Evaluating Answer Leakage Robustness of LLM Tutors against Adversarial Student Attacks

arXiv:2604.18660v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used in education, yet their default helpfulness often conflicts with pedagogical principles. Prior work evaluates pedagogical quality via answer leakage-the disclosure of complete solutions instead of scaffolding-but typically assumes well-intentioned learners, leaving tutor robustness under student misuse largely unexplored. In this paper, we study scenarios where students behave adversarially and a

ArXiv AI

Research

A Proxy Consistency Loss for Grounded Fusion of Earth Observation and Location Encoders

arXiv:2604.18881v1 Announce Type: cross Abstract: Supervised learning with Earth observation inputs is often limited by the sparsity of high-quality labeled or in-situ measured data to use as training labels. With the abundance of geographic data products, in many cases there are variables correlated with - but different from - the variable of interest that can be leveraged. We integrate such proxy variables within a geographic prior via a trainable location encoder and introduce a proxy consist

ArXiv AI

Research

MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation

arXiv:2604.18914v1 Announce Type: cross Abstract: While multilingual large language models (LLMs) perform well on high-level tasks like translation and question answering, their ability to handle grammatical gender and morphological agreement remains underexplored. In morphologically rich languages, gender influences verb conjugation, pronouns, and even first-person constructions with explicit and implicit mentions of gender. We introduce MORPHOGEN, a morphologically grounded large-scale benchma

ArXiv AI

Research

NeuroAI and Beyond: Bridging Between Advances in Neuroscience and ArtificialIntelligence

arXiv:2604.18637v1 Announce Type: cross Abstract: Neuroscience and Artificial Intelligence (AI) have made impressive progress in recent years but remain only loosely interconnected. Based on a workshop convened by the National Science Foundation in August 2025, we identify three fundamental capability gaps in current AI: the inability to interact with the physical world, inadequate learning that produces brittle systems, and unsustainable energy and data inefficiency. We describe the neuroscienc

ArXiv AI

Research

Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training

arXiv:2604.18701v1 Announce Type: cross Abstract: Local prediction-error-based curiosity rewards focus on the current transition without considering the world model's cumulative prediction error across all visited transitions. We introduce Curiosity-Critic, which grounds its intrinsic reward in the improvement of this cumulative objective, and show that it reduces to a tractable per-step form: the difference between the current prediction error and the asymptotic error baseline of the current st

ArXiv AI

Research

Characterizing AlphaEarth Embedding Geometry for Agentic Environmental Reasoning

arXiv:2604.18715v1 Announce Type: cross Abstract: Earth observation foundation models encode land surface information into dense embedding vectors, yet the geometric structure of these representations and its implications for downstream reasoning remain underexplored. We characterize the manifold geometry of Google AlphaEarth's 64-dimensional embeddings across 12.1 million Continental United States samples (2017--2023) and develop an agentic system that leverages this geometric understanding for

ArXiv AI

Research

Skillful Global Ocean Emulation and the Role of Correlation-Aware Loss

arXiv:2604.18727v1 Announce Type: cross Abstract: Machine learning emulators have shown extraordinary skill in forecasting atmospheric states, and their application to global ocean dynamics offers similar promise. Here, we adapt the GraphCast architecture into a dedicated ocean-only emulator, driven by prescribed atmospheric conditions, for medium-range predictions. The emulator is trained on NOAA's UFS-Replay dataset. Using a 24 hour time step, single initial condition, and without using autore

ArXiv AI

Research

The Cost of Relaxation: Evaluating the Error in Convex Neural Network Verification

arXiv:2604.18728v1 Announce Type: cross Abstract: Many neural network (NN) verification systems represent the network's input-output relation as a constraint program. Sound and complete, representations involve integer constraints, for simulating the activations. Recent works convexly relax the integer constraints, improving performance, at the cost of soundness. Convex relaxations consider outputs that are unreachable by the original network. We study the worst case divergence between the origi

ArXiv AI

Research

Beyond Coefficients: Forecast-Necessity Testing for Interpretable Causal Discovery in Nonlinear Time-Series Models

arXiv:2604.18751v1 Announce Type: cross Abstract: Nonlinear machine-learning models are increasingly used to discover causal relationships in time-series data, yet the interpretation of their outputs remains poorly understood. In particular, causal scores produced by regularized neural autoregressive models are often treated as analogues of regression coefficients, leading to misleading claims of statistical significance. In this paper, we argue that causal relevance in nonlinear time-series mod

ArXiv AI

Research

Handling and Interpreting Missing Modalities in Patient Clinical Trajectories via Autoregressive Sequence Modeling

arXiv:2604.18753v1 Announce Type: cross Abstract: An active challenge in developing multimodal machine learning (ML) models for healthcare is handling missing modalities during training and deployment. As clinical datasets are inherently temporal and sparse in terms of modality presence, capturing the underlying predictive signal via diagnostic multimodal ML models while retaining model explainability remains an ongoing challenge. In this work, we address this by re-framing clinical diagnosis as

ArXiv AI

Research

Multi-Level Temporal Graph Networks with Local-Global Fusion for Industrial Fault Diagnosis

arXiv:2604.18765v1 Announce Type: cross Abstract: Fault detection and diagnosis are critical for the optimal and safe operation of industrial processes. The correlations among sensors often display non-Euclidean structures where graph neural networks (GNNs) are widely used therein. However, for large-scale systems, local, global, and dynamic relations extensively exist among sensors, and traditional GNNs often overlook such complex and multi-level structures for various problems including the fa

ArXiv AI

Research

Who Shapes Brazil's Vaccine Debate? Semi-Supervised Modeling of Stance and Polarization in YouTube's Media Ecosystem

arXiv:2604.18586v1 Announce Type: cross Abstract: Vaccination remains a cornerstone of global public health, yet the COVID-19 pandemic exposed how online misinformation, political polarization, and declining institutional trust can undermine immunization efforts. Most of the prior computational studies that analyzed vaccine discourse on social platforms focus on English-language data, specific vaccines, or short time windows, impairing our understanding of long-term dynamics in high-impact, non-

ArXiv AI

Research

Fine-Tuning Small Reasoning Models for Quantum Field Theory

arXiv:2604.18936v1 Announce Type: cross Abstract: Despite the growing application of Large Language Models (LLMs) to theoretical physics, there is little academic exploration into how domain-specific physics reasoning ability develops while training these models. To investigate this, we perform the first academic fine-tuning study of small (7B-parameter) reasoning models dedicated specifically to theoretical physics. Because open-source verifiable training data required to train such capabilitie

ArXiv AI

Research

S2MAM: Semi-supervised Meta Additive Model for Robust Estimation and Variable Selection

arXiv:2604.19072v1 Announce Type: cross Abstract: Semi-supervised learning with manifold regularization is a classical framework for jointly learning from both labeled and unlabeled data, where the key requirement is that the support of the unknown marginal distribution has the geometric structure of a Riemannian manifold. Typically, the Laplace-Beltrami operator-based manifold regularization can be approximated empirically by the Laplacian regularization associated with the entire training data

ArXiv AI

Research

SAHM: A Benchmark for Arabic Financial and Shari'ah-Compliant Reasoning

arXiv:2604.19098v1 Announce Type: cross Abstract: English financial NLP has progressed rapidly through benchmarks for sentiment, document understanding, and financial question answering, while Arabic financial NLP remains comparatively under-explored despite strong practical demand for trustworthy finance and Islamic-finance assistants. We introduce SAHM, a document-grounded benchmark and instruction-tuning dataset for Arabic financial NLP and Shari'ah-compliant reasoning. SAHM contains 14,380 e

ArXiv AI

Research

Think Before Writing: Feature-Level Multi-Objective Optimization for Generative Citation Visibility

arXiv:2604.19113v1 Announce Type: cross Abstract: Generative answer engines expose content through selective citation rather than ranked retrieval, fundamentally altering how visibility is determined. This shift calls for new optimization methods beyond traditional search engine optimization. Existing generative engine optimization (GEO) approaches primarily rely on token-level text rewriting, offering limited interpretability and weak control over the trade-off between citation visibility and c

ArXiv AI

Research

Streamliners for Answer Set Programming

arXiv:2604.19251v1 Announce Type: cross Abstract: Streamliner constraints reduce the search space of combinatorial problems by ruling out portions of the solution space. We adapt the StreamLLM approach, which uses Large Language Models (LLMs) to generate streamliners for Constraint Programming, to Answer Set Programming (ASP). Given an ASP encoding and a few small training instances, we prompt multiple LLMs to propose candidate constraints. Candidates that cause syntax errors, render satisfiable

ArXiv AI

Trend

Unauthorized group has gained access to Anthropic’s exclusive cyber tool Mythos, report claims

Anthropic told TechCrunch it is investigating the claims, but maintains that there is no evidence that its systems have been impacted.

TechCrunch AI

Trend

AI backlash is coming for elections

Ask Americans how they feel about AI and most say they have concerns. Communities have mounted resistance to data center projects, stalling them across the US. On social media, anger at AI companies and executives is unrestrained - sometimes to the point of condoning violence. But look at the issues that most campaigns are focused […]

The Verge AI

Research

Contrary to popular superstition, AES 128 is just fine in a post-quantum world

A stubborn misconception is hampering the already hard work of quantum readiness.

Ars Technica AI

Research

Quantum inspired qubit qutrit neural networks for real time financial forecasting

arXiv:2604.18838v1 Announce Type: new Abstract: This research investigates the performance and efficacy of machine learning models in stock prediction, comparing Artificial Neural Networks (ANNs), Quantum Qubit-based Neural Networks (QQBNs), and Quantum Qutrit-based Neural Networks (QQTNs). By outlining methodologies, architectures, and training procedures, the study highlights significant differences in training times and performance metrics across models. While all models demonstrate robust ac

ArXiv AI

Research

Error-free Training for MedMNIST Datasets

arXiv:2604.18916v1 Announce Type: new Abstract: In this paper, we introduce a new concept called Artificial Special Intelligence by which Machine Learning models for the classification problem can be trained error-free, thus acquiring the capability of not making repeated mistakes. The method is applied to 18 MedMNIST biomedical datasets. Except for three datasets, which suffer from the double-labeling problem, all are trained to perfection.

ArXiv AI

Research

Plausible Reasoning and First-Order Plausible Logic

arXiv:2604.19036v1 Announce Type: new Abstract: Defeasible statements are statements that are likely, or probable, or usually true, but may occasionally be false. Plausible reasoning makes conclusions from statements that are either facts or defeasible statements without using numbers. So there are no probabilities or suchlike involved. Seventeen principles of logics that do plausible reasoning are suggested and several important plausible reasoning examples are considered. There are 14 necessar

ArXiv AI

Research

Has Automated Essay Scoring Reached Sufficient Accuracy? Deriving Achievable QWK Ceilings from Classical Test Theory

arXiv:2604.19131v1 Announce Type: new Abstract: Automated essay scoring (AES) is commonly evaluated on public benchmarks using quadratic weighted kappa (QWK). However, because benchmark labels are assigned by human raters and inevitably contain scoring errors, it remains unclear both what QWK is theoretically attainable and what level is practically sufficient for deployment. We therefore derive two dataset-specific QWK ceilings based on the reliability concept in classical test theory, which ca

ArXiv AI

Research

DanceCrafter: Fine-Grained Text-Driven Controllable Dance Generation via Choreographic Syntax

arXiv:2604.18648v1 Announce Type: cross Abstract: Text-driven controllable dance generation remains under-explored, primarily due to the severe scarcity of high-quality datasets and the inherent difficulty of articulating complex choreographies. Characterizing dance is particularly challenging owing to its intricate spatial dynamics, strong directionality, and the highly decoupled movements of distinct body parts. To overcome these bottlenecks, we bridge principles from dance studies, human anat

ArXiv AI

Research

On Solving the Multiple Variable Gapped Longest Common Subsequence Problem

arXiv:2604.18645v1 Announce Type: new Abstract: This paper addresses the Variable Gapped Longest Common Subsequence (VGLCS) problem, a generalization of the classical LCS problem involving flexible gap constraints between consecutive solutions' characters. The problem arises in molecular sequence comparison, where structural distance constraints between residues must be respected, and in time-series analysis where events are required to occur within specified temporal delays. We propose a search

ArXiv AI

Research

Relational AI in Education: Reciprocity, Participatory Design, and Indigenous Worldviews

arXiv:2604.19099v1 Announce Type: cross Abstract: Education is not merely the transmission of information or the optimisation of individual performance; it is a fundamentally social, constructive, and relational practice. However, recent advances in generative artificial intelligence (GenAI) increasingly emphasise efficiency, automation, and individualised assistance, risking the weakening of relational learning processes. Despite growing adoption, AI in education (AIED) research has yet to full

ArXiv AI

Get AI Pulse every morning

5 stories. 5 minutes. Personalized for your role. Free forever.

Open AI Pulse