AI Pulse

AI News: Apr 24, 2026

Today's 5 most important AI stories — product launches, research, funding, and more.

AI AgentsPolicyProduct LaunchProduct LaunchAI Agents
1AI Agents
■■■□□ 3/5

OpenAI Launches Workspace Agents for Team Workflow Automation

OpenAI published a guide on building and scaling workspace agents within ChatGPT, enabling teams to automate repeatable workflows and connect tools. The Academy resource targets professionals looking to streamline operations using AI agents integrated directly into existing team environments.

Why it matters

Workspace agents signal ChatGPT's shift from individual productivity tool to enterprise workflow platform, directly competing with Microsoft Copilot and Google Workspace AI.

2Policy
■■■□□ 3/5

Post-Training Fixes Won't Erase AI Copyright Liability, Paper Argues

A new arXiv paper argues that machine unlearning and inference-time guardrails cannot retroactively fix copyright infringement during AI training. Authors contend legal liability stems from how data was acquired, not what models output. The position challenges a common industry defense strategy as generative AI faces mounting legal scrutiny.

Why it matters

If courts adopt this reasoning, AI companies cannot rely on post-deployment technical fixes to escape liability for training data violations—forcing compliance decisions upstream, before training begi

3Product Launch
■■■□□ 3/5

Google Launches Two Specialized 8th-Gen TPUs for AI Agents

Google unveiled two 8th-generation TPUs at Cloud Next: the TPU v8t for training large AI models and TPU v8i for inference workloads. The chips are purpose-built for agentic AI applications. Both will be available via Google Cloud, giving enterprises dedicated hardware to run and deploy next-generation AI agents at scale.

Why it matters

Purpose-built inference and training chips signal Google is hardening its cloud infrastructure to compete directly with Nvidia and AWS for enterprise AI workloads.

4Product Launch
■■□□□ 2/5

OpenAI Launches GPT-5.5, Advancing Its Super App Ambitions

OpenAI released GPT-5.5 on April 23, 2026, touting improved capabilities across multiple categories. The model is part of the company's broader strategy to consolidate AI tools into a single platform, moving toward what OpenAI envisions as an AI super app. Specific benchmark details were not disclosed in the announcement.

Why it matters

A unified AI super app from OpenAI could reshape how professionals access and pay for AI tools, threatening specialized competitors.

5AI Agents
■■■□□ 3/5

OpenAI WebSockets Cut Latency in Codex Agent Loops

OpenAI detailed how its Codex agent uses WebSockets and connection-scoped caching in the Responses API to reduce overhead and improve model latency. The persistent connections eliminate repeated handshake costs across multi-step agent loops, making agentic workflows faster and more efficient for developers building on the platform.

Why it matters

Developers building multi-step AI agents can now achieve meaningfully lower latency and infrastructure costs using WebSocket connections in OpenAI's Responses API.

176 More Stories Today

AI Agents

ClawNet: Human-Symbiotic Agent Network for Cross-User Autonomous Cooperation

arXiv:2604.19211v1 Announce Type: new Abstract: Current AI agent frameworks have made remarkable progress in automating individual tasks, yet all existing systems serve a single user. Human productivity rests on the social and organizational relationships through which people coordinate, negotiate, and delegate. When agents move beyond performing tasks for one person to representing that person in collaboration with others, the infrastructure for cross-user agent collaboration is entirely absent

ArXiv AI

AI Tools

UAF: A Unified Audio Front-end LLM for Full-Duplex Speech Interaction

arXiv:2604.19221v1 Announce Type: new Abstract: Full-duplex speech interaction, as the most natural and intuitive mode of human communication, is driving artificial intelligence toward more human-like conversational systems. Traditional cascaded speech processing pipelines suffer from critical limitations, including accumulated latency, information loss, and error propagation across modules. To address these issues, recent efforts focus on the end-to-end audio large language models (LLMs) like G

ArXiv AI

Research

GRASPrune: Global Gating for Budgeted Structured Pruning of Large Language Models

arXiv:2604.19398v1 Announce Type: new Abstract: Large language models (LLMs) are expensive to serve because model parameters, attention computation, and KV caches impose substantial memory and latency costs. We present GRASPrune, a structured pruning framework applied after pretraining that jointly prunes FFN channels and KV head groups under a single global budget. Instead of learning importance scores without constraints and applying the budget only after training, GRASPrune learns lightweight

ArXiv AI

Research

Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents

arXiv:2604.19457v1 Announce Type: new Abstract: Long-horizon enterprise agents make high-stakes decisions (loan underwriting, claims adjudication, clinical review, prior authorization) under lossy memory, multi-step reasoning, and binding regulatory constraints. Current evaluation reports a single task-success scalar that conflates distinct failure modes and hides whether an agent is aligned with the standards its deployment environment requires. We propose that long-horizon decision behavior de

ArXiv AI

Research

SimDiff: Depth Pruning via Similarity and Difference

arXiv:2604.19520v1 Announce Type: new Abstract: Depth pruning improves the deployment efficiency of large language models (LLMs) by identifying and removing redundant layers. A widely accepted standard for this identification process is to measure the similarity between layers using cosine distance. However, we find that methods relying solely on this one-dimensional heuristic can exhibit unpredictable performance and even catastrophic collapse across different architectures. To address this iss

ArXiv AI

Research

DT2IT-MRM: Debiased Preference Construction and Iterative Training for Multimodal Reward Modeling

arXiv:2604.19544v1 Announce Type: new Abstract: Multimodal reward models (MRMs) play a crucial role in aligning Multimodal Large Language Models (MLLMs) with human preferences. Training a good MRM requires high-quality multimodal preference data. However, existing preference datasets face three key challenges: lack of granularity in preference strength, textual style bias, and unreliable preference signals. Besides, existing open-source multimodal preference datasets suffer from substantial nois

ArXiv AI

Research

Detecting Data Contamination in Large Language Models

arXiv:2604.19561v1 Announce Type: new Abstract: Large Language Models (LLMs) utilize large amounts of data for their training, some of which may come from copyrighted sources. Membership Inference Attacks (MIA) aim to detect those documents and whether they have been included in the training corpora of the LLMs. The black-box MIAs require a significant amount of data manipulation; therefore, their comparison is often challenging. We study state-of-the-art (SOTA) MIAs under the black-box assumpti

ArXiv AI

Research

SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models

arXiv:2604.19638v1 Announce Type: new Abstract: Multimodal Large Language Models are increasingly adopted as autonomous agents in interactive environments, yet their ability to proactively address safety hazards remains insufficient. We introduce SafetyALFRED, built upon the embodied agent benchmark ALFRED, augmented with six categories of real-world kitchen hazards. While existing safety evaluations focus on hazard recognition through disembodied question answering (QA) settings, we evaluate el

ArXiv AI

AI Agents

CentaurTA Studio: A Self-Improving Human-Agent Collaboration System for Thematic Analysis

arXiv:2604.18589v1 Announce Type: cross Abstract: Thematic analysis is difficult to scale: manual workflows are labor-intensive, while fully automated pipelines often lack controllability and transparent evaluation. We present \textbf{CentaurTA Studio}, a web-based system for self-improving human--agent collaboration in open coding and theme construction. The system integrates (1) a two-stage human feedback pipeline separating simulator drafting and expert validation, (2) persistent prompt optim

ArXiv AI

Research

Two-dimensional early exit optimisation of LLM inference

arXiv:2604.18592v1 Announce Type: cross Abstract: We introduce a two-dimensional (2D) early exit strategy that coordinates layer-wise and sentence-wise exiting for classification tasks in large language models. By processing input incrementally sentence-by-sentence while progressively activating deeper layers, our method achieves multiplicative computational savings that exceed those from optimizing either dimension independently. Experimental evaluation across four state-of-the-art LLMs (Llama

ArXiv AI

Research

TurboEvolve: Towards Fast and Robust LLM-Driven Program Evolution

arXiv:2604.18607v1 Announce Type: cross Abstract: LLM-driven program evolution can discover high-quality programs, but its cost and run-to-run variance hinder reliable progress. We propose TurboEvolve, a multi-island evolutionary framework that improves sample efficiency and robustness under fixed evaluation budgets. Inspired by the multiple-offspring strategy in evolutionary algorithms, TurboEvolve introduces verbalized Sampling, prompting the LLM to emit K diverse candidates with explicit self

ArXiv AI

Research

SpikeMLLM: Spike-based Multimodal Large Language Models via Modality-Specific Temporal Scales and Temporal Compression

arXiv:2604.18610v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) have achieved remarkable progress but incur substantial computational overhead and energy consumption during inference, limiting deployment in resource-constrained environments. Spiking Neural Networks (SNNs), with their sparse event-driven computation, offer inherent energy efficiency advantages on neuromorphic hardware, yet extending them to MLLMs faces two key challenges: heterogeneous modalities make u

ArXiv AI

AI Agents

Agent-GWO: Collaborative Agents for Dynamic Prompt Optimization in Large Language Models

arXiv:2604.18612v1 Announce Type: cross Abstract: Large Language Models (LLMs) have demonstrated strong capabilities in complex reasoning tasks, while recent prompting strategies such as Chain-of-Thought (CoT) have further elevated their performance in handling complex logical problems. Despite these advances, high-quality reasoning remains heavily reliant on manual static prompts and is sensitive to decoding configurations and task distributions, leading to performance fluctuations and limited

ArXiv AI

AI Tools

ARGUS: Agentic GPU Optimization Guided by Data-Flow Invariants

arXiv:2604.18616v1 Announce Type: cross Abstract: LLM-based coding agents can generate functionally correct GPU kernels, yet their performance remains far below hand-optimized libraries on critical computations such as matrix multiplication, attention, and Mixture-of-Experts (MoE). Peak GPU performance requires coordinated reasoning over tightly coupled optimizations, including tiling, shared-memory staging, software pipelining, and instruction scheduling, while existing agents rely on sparse pa

ArXiv AI

Research

Easy Samples Are All You Need: Self-Evolving LLMs via Data-Efficient Reinforcement Learning

arXiv:2604.18639v1 Announce Type: cross Abstract: Previous LLMs-based RL studies typically follow either supervised learning with high annotation costs, or unsupervised paradigms using voting or entropy-based rewards. However, their performance remains far from satisfactory due to the substantial annotation cost and issues such as model collapse or reward hacking. To address these issues, we introduce a new perspective inspired by cognitive learning theory and propose a novel approach called Eas

ArXiv AI

AI Agents

From Craft to Kernel: A Governance-First Execution Architecture and Semantic ISA for Agentic Computers

arXiv:2604.18652v1 Announce Type: cross Abstract: The transition of agentic AI from brittle prototypes to production systems is stalled by a pervasive crisis of craft. We suggest that the prevailing orchestration paradigm-delegating the system control loop to large language models and merely patching with heuristic guardrails-is the root cause of this fragility. Instead, we propose Arbiter-K, a Governance-First execution architecture that reconceptualizes the underlying model as a Probabilistic

ArXiv AI

AI Agents

Owner-Harm: A Missing Threat Model for AI Agent Safety

arXiv:2604.18658v1 Announce Type: cross Abstract: Existing AI agent safety benchmarks focus on generic criminal harm (cybercrime, harassment, weapon synthesis), leaving a systematic blind spot for a distinct and commercially consequential threat category: agents harming their own deployers. Real-world incidents illustrate the gap: Slack AI credential exfiltration (Aug 2024), Microsoft 365 Copilot calendar-injection leaks (Jan 2024), and a Meta agent unauthorized forum post exposing operational d

ArXiv AI

AI Agents

Towards Optimal Agentic Architectures for Offensive Security Tasks

arXiv:2604.18718v1 Announce Type: cross Abstract: Agentic security systems increasingly audit live targets with tool-using LLMs, but prior systems fix a single coordination topology, leaving unclear when additional agents help and when they only add cost. We treat topology choice as an empirical systems question. We introduce a controlled benchmark of 20 interactive targets (10 web/API and 10 binary), each exposing one endpoint-reachable ground-truth vulnerability, evaluated in whitebox and blac

ArXiv AI

Research

Where Fake Citations Are Made: Tracing Field-Level Hallucination to Specific Neurons in LLMs

arXiv:2604.18880v1 Announce Type: cross Abstract: LLMs frequently generate fictitious yet convincing citations, often expressing high confidence even when the underlying reference is wrong. We study this failure across 9 models and 108{,}000 generated references, and find that author names fail far more often than other fields across all models and settings. Citation style has no measurable effect, while reasoning-oriented distillation degrades recall. Probes trained on one field transfer at nea

ArXiv AI

Research

Harmful Intent as a Geometrically Recoverable Feature of LLM Residual Streams

arXiv:2604.18901v1 Announce Type: cross Abstract: Harmful intent is geometrically recoverable from large language model residual streams: as a linear direction in most layers, and as angular deviation in layers where projection methods fail. Across 12 models spanning four architectural families (Qwen2.5, Qwen3.5, Llama-3.2, Gemma-3) and three alignment variants (base, instruction-tuned, abliterated), under single-turn, English evaluation, we characterise this geometry through six direction-findi

ArXiv AI

Research

FedProxy: Federated Fine-Tuning of LLMs via Proxy SLMs and Heterogeneity-Aware Fusion

arXiv:2604.19015v1 Announce Type: cross Abstract: Federated fine-tuning of Large Language Models (LLMs) is obstructed by a trilemma of challenges: protecting LLMs intellectual property (IP), ensuring client privacy, and mitigating performance loss on heterogeneous data. Existing methods like Offsite-Tuning (OT) secure the LLMs IP by having clients train only lightweight adapters, yet our analysis reveals they suffer from a fundamental performance bottleneck, leaving a significant gap compared to

ArXiv AI

Industry

Kernel code removals driven by LLM-created security reports

Article URL: https://lwn.net/Articles/1068928/ Comments URL: https://news.ycombinator.com/item?id=47862230 Points: 120 # Comments: 115

Hacker News

Product Launch

Google updates Workspace to make AI your new office intern

Google has introduced a host of new automated functions into Workspace, all of which are driven by Workspace Intelligence, its new AI system.

TechCrunch AI

Product Launch

Google Cloud launches two new AI chips to compete with Nvidia

Google's newest TPUs are faster and cheaper than the previous versions. But the company is still embracing Nvidia in its cloud — for now.

TechCrunch AI

Product Launch

Google turns Chrome into an AI co-worker for the workplace

Google brings Gemini-powered "auto browse" capabilities to Chrome for enterprise users, letting workers automate tasks like research, data entry, and more.

TechCrunch AI

Product Launch

OpenAI says its new GPT-5.5 model is more efficient and better at coding

OpenAI just announced its new GPT-5.5 model, which the company calls its "smartest and most intuitive to use model yet, and the next step toward a new way of getting work done on a computer." OpenAI just released GPT-5.4 last month, but says that the new GPT-5.5 "excels" at tasks like writing and debugging code, […]

The Verge AI

Product Launch

OpenAI now lets teams make custom bots that can do work on their own

OpenAI is giving users of its Business, Enterprise, Edu, and Teachers plans access to cloud-based "workspace" agents available in ChatGPT that can perform business tasks. In its blog post, OpenAI gives examples of agents like one that finds product feedback on the web and sends a report in Slack and a sales agent that can […]

The Verge AI

Product Launch

Introducing GPT-5.5

Introducing GPT-5.5, our smartest model yet—faster, more capable, and built for complex tasks like coding, research, and data analysis across tools.

OpenAI Blog

Product Launch

Introducing workspace agents in ChatGPT

Workspace agents in ChatGPT are Codex-powered agents that automate complex workflows, run in the cloud, and help teams scale work across tools securely.

OpenAI Blog

Research

An update on recent Claude Code quality reports

Article URL: https://www.anthropic.com/engineering/april-23-postmortem Comments URL: https://news.ycombinator.com/item?id=47878905 Points: 524 # Comments: 395

Hacker News

Product Launch

Anker made its own chip to bring AI to all its products

Article URL: https://www.theverge.com/tech/916463/anker-thus-chip-announcement Comments URL: https://news.ycombinator.com/item?id=47866368 Points: 67 # Comments: 47

Hacker News

Research

ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System

arXiv:2604.18789v1 Announce Type: new Abstract: Reinforcement Learning from Human Feedback (RLHF) is central to aligning Large Language Models (LLMs), yet it introduces a critical vulnerability: an imperfect Reward Model (RM) can become a single point of failure when it fails to penalize unsafe behaviors. While existing red-teaming approaches primarily target policy-level weaknesses, they overlook what we term systemic weaknesses cases where both the core LLM and the RM fail in tandem. We presen

ArXiv AI

Research

OmniMouse: Scaling properties of multi-modal, multi-task Brain Models on 150B Neural Tokens

arXiv:2604.18827v1 Announce Type: cross Abstract: Scaling data and artificial neural networks has transformed AI, driving breakthroughs in language and vision. Whether similar principles apply to modeling brain activity remains unclear. Here we leveraged a dataset of 3.1 million neurons from the visual cortex of 73 mice across 323 sessions, totaling more than 150 billion neural tokens recorded during natural movies, images and parametric stimuli, and behavior. We train multi-modal, multi-task mo

ArXiv AI

Funding

How SpaceX preempted a $2B fundraise with a $60B buyout offer

Cursor was on track to close a $2 billion funding round this week but chose to halt discussions after SpaceX offered a $10 billion "collaboration fee" and a path to a $60 billion acquisition.

TechCrunch AI

Trend

AI needs a strong data fabric to deliver business value

Artificial intelligence is moving quickly in the enterprise, from experimentation to everyday use. Organizations are deploying copilots, agents, and predictive systems across finance, supply chains, human resources, and customer operations. By the end of 2025, half of companies used AI in at least three business functions, according to a recent survey. But as AI becomes…

MIT Technology Review

AI Tools

Here’s how our TPUs power increasingly demanding AI workloads.

Learn how Google’s TPUs power increasingly demanding AI workloads with this new video.

Google AI Blog

AI Tools

5 AI Models Tried to Scam Me. Some of Them Were Scary Good

The cyber capabilities of AI models have experts rattled. AI’s social skills may be just as dangerous.

Wired AI

AI Tools

AI Tools Are Helping Mediocre North Korean Hackers Steal Millions

One group of hackers used AI for everything from vibe coding their malware to creating fake company websites—and stole as much as $12 million in three months.

Wired AI

AI Tools

Human-Guided Harm Recovery for Computer Use Agents

arXiv:2604.18847v1 Announce Type: new Abstract: As LM agents gain the ability to execute actions on real computer systems, we need ways to not only prevent harmful actions at scale but also effectively remediate harm when prevention fails. We formalize a solution to this neglected challenge in post-execution safeguards as harm recovery: the problem of optimally steering an agent from a harmful state back to a safe one in alignment with human preferences. We ground preference-aligned recovery thr

ArXiv AI

Research

How Adversarial Environments Mislead Agentic AI?

arXiv:2604.18874v1 Announce Type: new Abstract: Tool-integrated agents are deployed on the premise that external tools ground their outputs in reality. Yet this very reliance creates a critical attack surface. Current evaluations benchmark capability in benign settings, asking "can the agent use tools correctly" but never "what if the tools lie". We identify this Trust Gap: agents are evaluated for performance, not for skepticism. We formalize this vulnerability as Adversarial Environmental Inje

ArXiv AI

AI Tools

AutomationBench

arXiv:2604.18934v1 Announce Type: new Abstract: Existing AI benchmarks for software automation rarely combine cross-application coordination, autonomous API discovery, and policy adherence. Real business workflows demand all three: a single task may span a CRM, inbox, calendar, and messaging platform - requiring the agent to find the right endpoints, follow a policy document, and write correct data to each system. To address this gap, we introduce AutomationBench, a benchmark for evaluating AI a

ArXiv AI

AI Tools

Personalized Benchmarking: Evaluating LLMs by Individual Preferences

arXiv:2604.18943v1 Announce Type: new Abstract: With the rise in capabilities of large language models (LLMs) and their deployment in real-world tasks, evaluating LLM alignment with human preferences has become an important challenge. Current benchmarks average preferences across all users to compute aggregate ratings, overlooking individual user preferences when establishing model rankings. Since users have varying preferences in different contexts, we call for personalized LLM benchmarks that

ArXiv AI

Research

Reasoning Structure Matters for Safety Alignment of Reasoning Models

arXiv:2604.18946v1 Announce Type: new Abstract: Large reasoning models (LRMs) achieve strong performance on complex reasoning tasks but often generate harmful responses to malicious user queries. This paper investigates the underlying cause of these safety risks and shows that the issue lies in the reasoning structure itself. Based on this insight, we claim that effective safety alignment can be achieved by altering the reasoning structure. We propose AltTrain, a simple yet effective post traini

ArXiv AI

AI Tools

DW-Bench: Benchmarking LLMs on Data Warehouse Graph Topology Reasoning

arXiv:2604.18964v1 Announce Type: new Abstract: This paper introduces DW-Bench, a new benchmark that evaluates large language models (LLMs) on graph-topology reasoning over data warehouse schemas, explicitly integrating both foreign-key (FK) and data-lineage edges. The benchmark comprises 1,046 automatically generated, verifiably correct questions across five schemas. Experiments show that tool-augmented methods substantially outperform static approaches but plateau on hard compositional subtype

ArXiv AI

Research

SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution

arXiv:2604.18982v1 Announce Type: new Abstract: Social intelligence, the ability to navigate complex interpersonal interactions, presents a fundamental challenge for language agents. Training such agents via reinforcement learning requires solving the credit assignment problem: determining how individual utterances contribute to multi-turn dialogue outcomes. Existing approaches directly employ language models to distribute episode-level rewards, yielding attributions that are retrospective and l

ArXiv AI

Industry

Reinforcement Learning Improves LLM Accuracy and Reasoning in Disease Classification from Radiology Reports

arXiv:2604.19060v1 Announce Type: new Abstract: Accurate disease classification from radiology reports is essential for many applications. While supervised fine-tuning (SFT) of lightweight LLMs improves accuracy, it can degrade reasoning. We propose a two-stage approach: SFT on disease labels followed by Group Relative Policy Optimization (GRPO) to refine predictions by optimizing accuracy and format without reasoning supervision. Across three radiologist-annotated datasets, SFT outperformed bas

ArXiv AI

Research

OLLM: Options-based Large Language Models

arXiv:2604.19087v1 Announce Type: new Abstract: We introduce Options LLM (OLLM), a simple, general method that replaces the single next-token prediction of standard LLMs with a \textit{set of learned options} for the next token, indexed by a discrete latent variable. Instead of relying on temperature or sampling heuristics to induce diversity, OLLM models variation explicitly: a small latent space parametrizes multiple plausible next-token options which can be selected or searched by a downstrea

ArXiv AI

Research

Towards Scalable Lifelong Knowledge Editing with Selective Knowledge Suppression

arXiv:2604.19089v1 Announce Type: new Abstract: Large language models (LLMs) require frequent knowledge updates to reflect changing facts and mitigate hallucinations. To meet this demand, lifelong knowledge editing has emerged as a continual approach to modify specific pieces of knowledge without retraining the entire model. Existing parameter editing methods struggle with stability during sequential edits due to catastrophic forgetting. While retrieval-based approaches are proposed to alleviate

ArXiv AI

AI Tools

Reasoning-Aware AIGC Detection via Alignment and Reinforcement

arXiv:2604.19172v1 Announce Type: new Abstract: The rapid advancement and widespread adoption of Large Language Models (LLMs) have elevated the need for reliable AI-generated content (AIGC) detection, which remains challenging as models evolve. We introduce AIGC-text-bank, a comprehensive multi-domain dataset with diverse LLM sources and authorship scenarios, and propose REVEAL, a detection framework that generates interpretable reasoning chains before classification. Our approach uses a two-sta

ArXiv AI

Research

Explicit Trait Inference for Multi-Agent Coordination

arXiv:2604.19278v2 Announce Type: new Abstract: LLM-based multi-agent systems (MAS) show promise on complex tasks but remain prone to coordination failures such as goal drift, error cascades, and misaligned behaviors. We propose Explicit Trait Inference (ETI), a psychologically grounded method for improving coordination. ETI enables agents to infer and track partner characteristics along two established psychological dimensions--warmth (e.g., trust) and competence (e.g., skill)--from interaction

ArXiv AI

Research

Large Language Models Exhibit Normative Conformity

arXiv:2604.19301v1 Announce Type: new Abstract: The conformity bias exhibited by large language models (LLMs) can pose a significant challenge to decision-making in LLM-based multi-agent systems (LLM-MAS). While many prior studies have treated "conformity" simply as a matter of opinion change, this study introduces the social psychological distinction between informational conformity and normative conformity in order to understand LLM conformity at the mechanism level. Specifically, we design ne

ArXiv AI

Research

Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture The Flag Challenges

arXiv:2604.19354v1 Announce Type: new Abstract: Large Language Model (LLM) agents are increasingly proposed for autonomous cybersecurity tasks, but their capabilities in realistic offensive settings remain poorly understood. We present DeepRed, an open-source benchmark for evaluating LLM-based agents on realistic Capture The Flag (CTF) challenges in isolated virtualized environments. DeepRed places an agent in a Kali attacker environment with terminal tools and optional web search, connected ove

ArXiv AI

Research

Do LLMs Game Formalization? Evaluating Faithfulness in Logical Reasoning

arXiv:2604.19459v1 Announce Type: new Abstract: Formal verification guarantees proof validity but not formalization faithfulness. For natural-language logical reasoning, where models construct axiom systems from scratch without library constraints, this gap between valid proofs and faithful translations is especially acute. We investigate whether frontier models exploit this gap when generating Lean 4 proofs, a behavior we term formalization gaming. We evaluate GPT-5 and DeepSeek-R1 on 303 first

ArXiv AI

Research

CoDA: Towards Effective Cross-domain Knowledge Transfer via CoT-guided Domain Adaptation

arXiv:2604.19488v1 Announce Type: new Abstract: Large language models (LLMs) have achieved substantial advances in logical reasoning, yet they continue to lag behind human-level performance. In-context learning provides a viable solution that boosts the model's performance via prompting its input with expert-curated, in-domain exemplars. However, in many real-world, expertise-scarce domains, such as low-resource scientific disciplines, emerging biomedical subfields, or niche legal jurisdictions,

ArXiv AI

Research

From Experience to Skill: Multi-Agent Generative Engine Optimization via Reusable Strategy Learning

arXiv:2604.19516v1 Announce Type: new Abstract: Generative engines (GEs) are reshaping information access by replacing ranked links with citation-grounded answers, yet current Generative Engine Optimization (GEO) methods optimize each instance in isolation, unable to accumulate or transfer effective strategies across tasks and engines. We reframe GEO as a strategy learning problem and propose MAGEO, a multi-agent framework in which coordinated planning, editing, and fidelity-aware evaluation ser

ArXiv AI

Research

Integrating Anomaly Detection into Agentic AI for Proactive Risk Management in Human Activity

arXiv:2604.19538v1 Announce Type: new Abstract: Agentic AI, with goal-directed, proactive, and autonomous decision-making capabilities, offers a compelling opportunity to address movement-related risks in human activity, including the persistent hazard of falls among elderly populations. Despite numerous approaches to fall mitigation through fall prediction and detection, existing systems have not yet functioned as universal solutions across care pathways and safety-critical environments. This i

ArXiv AI

Research

Multi-modal Reasoning with LLMs for Visual Semantic Arithmetic

arXiv:2604.19567v1 Announce Type: new Abstract: Reinforcement learning (RL) as post-training is crucial for enhancing the reasoning ability of large language models (LLMs) in coding and math. However, their capacity for visual semantic arithmetic, inferring relationships from images, remains underexplored. The classic text analogy "king"-"man"+"woman" = "queen" illustrates relational reasoning, yet replacing text with images of "king" and "man" significantly reduces performance because it requir

ArXiv AI

Research

Time Series Augmented Generation for Financial Applications

arXiv:2604.19633v1 Announce Type: new Abstract: Evaluating the reasoning capabilities of Large Language Models (LLMs) for complex, quantitative financial tasks is a critical and unsolved challenge. Standard benchmarks often fail to isolate an agent's core ability to parse queries and orchestrate computations. To address this, we introduce a novel evaluation methodology and benchmark designed to rigorously measure an LLM agent's reasoning for financial time-series analysis. We apply this methodol

ArXiv AI

Research

A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding

arXiv:2604.19689v1 Announce Type: new Abstract: Understanding artworks requires multi-step reasoning over visual content and cultural, historical, and stylistic context. While recent multimodal large language models show promise in artwork explanation, they rely on implicit reasoning and internalized knowl- edge, limiting interpretability and explicit evidence grounding. We propose A-MAR, an Agent-based Multimodal Art Retrieval framework that explicitly conditions retrieval on structured reasoni

ArXiv AI

Research

Compile to Compress: Boosting Formal Theorem Provers by Compiler Outputs

arXiv:2604.18587v1 Announce Type: cross Abstract: Large language models (LLMs) have demonstrated significant potential in formal theorem proving, yet state-of-the-art performance often necessitates prohibitive test-time compute via massive roll-outs or extended context windows. In this work, we address this scalability bottleneck by exploiting an informative structure in formal verification: the observation that compilers map a vast space of diverse proof attempts to a compact set of structured

ArXiv AI

AI Tools

SPRITE: From Static Mockups to Engine-Ready Game UI

arXiv:2604.18591v1 Announce Type: cross Abstract: Game UI implementation requires translating stylized mockups into interactive engine entities. However, current "Screenshot-to-Code" tools often struggle with the irregular geometries and deep visual hierarchies typical of game interfaces. To bridge this gap, we introduce SPRITE, a pipeline that transforms static screenshots into editable engine assets. By integrating Vision-Language Models (VLMs) with a structured YAML intermediate representatio

ArXiv AI

Research

Neuromorphic Continual Learning for Sequential Deployment of Nuclear Plant Monitoring Systems

arXiv:2604.18611v1 Announce Type: cross Abstract: Anomaly detection in nuclear industrial control systems (ICS) requires continuous, energy-efficient monitoring across multiple subsystems that are often deployed at different stages of plant commissioning. When a conventional neural network is sequentially trained to monitor new subsystems, it catastrophically forgets previously learned anomaly patterns, a safety-critical failure mode. We present the first spiking neural network (SNN)-based anoma

ArXiv AI

Research

NeuroAI and Beyond: Bridging Between Advances in Neuroscience and ArtificialIntelligence

arXiv:2604.18637v1 Announce Type: cross Abstract: Neuroscience and Artificial Intelligence (AI) have made impressive progress in recent years but remain only loosely interconnected. Based on a workshop convened by the National Science Foundation in August 2025, we identify three fundamental capability gaps in current AI: the inability to interact with the physical world, inadequate learning that produces brittle systems, and unsustainable energy and data inefficiency. We describe the neuroscienc

ArXiv AI

Research

FASE : A Fairness-Aware Spatiotemporal Event Graph Framework for Predictive Policing

arXiv:2604.18644v2 Announce Type: cross Abstract: Predictive policing systems that allocate patrol resources based solely on predicted crime risk can unintentionally amplify racial disparities through feedback driven data bias. We present FASE, a Fairness Aware Spatiotemporal Event Graph framework, which integrates spatiotemporal crime prediction with fairness constrained patrol allocation and a closed loop deployment feedback simulator. We model Baltimore as a graph of 25 ZIP Code Tabulation Ar

ArXiv AI

Research

DanceCrafter: Fine-Grained Text-Driven Controllable Dance Generation via Choreographic Syntax

arXiv:2604.18648v1 Announce Type: cross Abstract: Text-driven controllable dance generation remains under-explored, primarily due to the severe scarcity of high-quality datasets and the inherent difficulty of articulating complex choreographies. Characterizing dance is particularly challenging owing to its intricate spatial dynamics, strong directionality, and the highly decoupled movements of distinct body parts. To overcome these bottlenecks, we bridge principles from dance studies, human anat

ArXiv AI

Research

Unlocking the Edge deployment and ondevice acceleration of multi-LoRA enabled one-for-all foundational LLM

arXiv:2604.18655v1 Announce Type: cross Abstract: Deploying large language models (LLMs) on smartphones poses significant engineering challenges due to stringent constraints on memory, latency, and runtime flexibility. In this work, we present a hardware-aware framework for efficient on-device inference of a LLaMA-based multilingual foundation model supporting multiple use cases on Samsung Galaxy S24 and S25 devices with SM8650 and SM8750 Qualcomm chipsets respectively. Our approach integrates a

ArXiv AI

Research

Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation

arXiv:2604.18663v1 Announce Type: cross Abstract: Existing jamming attacks on Retrieval-Augmented Generation (RAG) systems typically induce explicit refusals or denial-of-service behaviors, which are conspicuous and easy to detect. In this work, we formalize a subtler availability threat, termed soft failure, which degrades system utility by inducing fluent and coherent yet non-informative responses rather than overt failures. We propose Deceptive Evolutionary Jamming Attack (DEJA), an automated

ArXiv AI

Research

Handling and Interpreting Missing Modalities in Patient Clinical Trajectories via Autoregressive Sequence Modeling

arXiv:2604.18753v1 Announce Type: cross Abstract: An active challenge in developing multimodal machine learning (ML) models for healthcare is handling missing modalities during training and deployment. As clinical datasets are inherently temporal and sparse in terms of modality presence, capturing the underlying predictive signal via diagnostic multimodal ML models while retaining model explainability remains an ongoing challenge. In this work, we address this by re-framing clinical diagnosis as

ArXiv AI

Research

Towards Understanding the Robustness of Sparse Autoencoders

arXiv:2604.18756v1 Announce Type: cross Abstract: Large Language Models (LLMs) remain vulnerable to optimization-based jailbreak attacks that exploit internal gradient structure. While Sparse Autoencoders (SAEs) are widely used for interpretability, their robustness implications remain underexplored. We present a study of integrating pretrained SAEs into transformer residual streams at inference time, without modifying model weights or blocking gradients. Across four model families (Gemma, LLaMA

ArXiv AI

Research

HELM: Harness-Enhanced Long-horizon Memory for Vision-Language-Action Manipulation

arXiv:2604.18791v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models fail systematically on long-horizon manipulation tasks despite strong short-horizon performance. We show that this failure is not resolved by extending context length alone in the current reactive execution setting; instead, it stems from three recurring execution-loop deficiencies: the memory gap, the verification gap, and the recovery gap. We present HELM, a model-agnostic framework that addresses these defic

ArXiv AI

Research

LLM-as-Judge Framework for Evaluating Tone-Induced Hallucination in Vision-Language Models

arXiv:2604.18803v2 Announce Type: cross Abstract: Vision-Language Models (VLMs) are increasingly deployed in settings where reliable visual grounding carries operational consequences, yet their behavior under progressively coercive prompt phrasing remains undercharacterized. Existing hallucination benchmarks predominantly rely on neutral prompts and binary detection, leaving open how both the incidence and the intensity of fabrication respond to graded linguistic pressure across structurally dis

ArXiv AI

Research

One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models

arXiv:2604.18839v1 Announce Type: cross Abstract: Looped transformers scale computational depth without increasing parameter count by repeatedly applying a shared transformer block and can be used for iterative refinement, where each loop rewrites a full fixed-size prediction in parallel. On difficult problems, such as those that require search-like computation, reaching a highly structured solution starting from noise can require long refinement trajectories. Learning such trajectories is chall

ArXiv AI

Research

Temporal UI State Inconsistency in Desktop GUI Agents: Formalizing and Defending Against TOCTOU Attacks on Computer-Use Agents

arXiv:2604.18860v1 Announce Type: cross Abstract: GUI agents that control desktop computers via screenshot-and-click loops introduce a new class of vulnerability: the observation-to-action gap (mean 6.51 s on real OSWorld workloads) creates a Time-Of-Check, Time-Of-Use (TOCTOU) window during which an unprivileged attacker can manipulate the UI state. We formalize this as a Visual Atomicity Violation and characterize three concrete attack primitives: (A) Notification Overlay Hijack, (B) Window Fo

ArXiv AI

Research

Hierarchically Robust Zero-shot Vision-language Models

arXiv:2604.18867v1 Announce Type: cross Abstract: Vision-Language Models (VLMs) can perform zero-shot classification but are susceptible to adversarial attacks. While robust fine-tuning improves their robustness, existing approaches align fixed text embeddings with an image embedding, sacrificing natural performance and robustness. A robustness degradation also occurs when a model faces adversarial attacks targeting superclasses (parent classes, e.g., mammal) in addition to their base (leaf) cla

ArXiv AI

AI Tools

Choose Your Own Adventure: Non-Linear AI-Assisted Programming with EvoGraph

arXiv:2604.18883v1 Announce Type: cross Abstract: Current AI-assisted programming tools are predominantly linear and chat-based, which deviates from the iterative and branching nature of programming itself. Our preliminary study with developers using AI assistants suggested that they often struggle to explore alternatives, manage prompting sequences, and trace changes. Informed by these insights, we created EvoGraph, an IDE plugin that integrates AI interactions and code changes as a lightweight

ArXiv AI

Policy

Regulating Artificial Intimacy: From Locks and Blocks to Relational Accountability

arXiv:2604.18893v1 Announce Type: cross Abstract: A series of high-profile tragedies involving companion chatbots has triggered an unusually rapid regulatory response. Several jurisdictions, including Australia, California, and New York, have introduced enforceable regulation, while regulators elsewhere have signaled growing concern about risks posed by companion chatbots, particularly to children. In parallel, leading providers, notably OpenAI, appear to have strengthened their self-regulatory

ArXiv AI

Research

Gradient-Based Program Synthesis with Neurally Interpreted Languages

arXiv:2604.18907v1 Announce Type: cross Abstract: A central challenge in program induction has long been the trade-off between symbolic and neural approaches. Symbolic methods offer compositional generalisation and data efficiency, yet their scalability is constrained by formalisms such as domain-specific languages (DSLs), which are labour-intensive to create and may not transfer to new domains. In contrast, neural networks flexibly learn from data but tend to generalise poorly in compositional

ArXiv AI

Research

Gated Memory Policy

arXiv:2604.18933v1 Announce Type: cross Abstract: Robotic manipulation tasks exhibit varying memory requirements, ranging from Markovian tasks that require no memory to non-Markovian tasks that depend on historical information spanning single or multiple interaction trials. Surprisingly, simply extending observation histories of a visuomotor policy often leads to a significant performance drop due to distribution shift and overfitting. To address these issues, we propose Gated Memory Policy (GMP

ArXiv AI

Research

Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest

arXiv:2604.18955v1 Announce Type: cross Abstract: In this study, we present the first comprehensive evaluation of modern LLMs - including GPT-4, GPT-4o, GPT-3.5-Turbo, Gemini 1.5 Pro, DeepSeek-V3, Llama 3.2, and BERT - across three core social media analytics tasks on a Twitter (X) dataset: (I) Social Media Authorship Verification, (II) Social Media Post Generation, and (III) User Attribute Inference. For the authorship verification, we introduce a systematic sampling framework over diverse user

ArXiv AI

Research

Distillation Traps and Guards: A Calibration Knob for LLM Distillability

arXiv:2604.18963v1 Announce Type: cross Abstract: Knowledge distillation (KD) transfers capabilities from large language models (LLMs) to smaller students, yet it can fail unpredictably and also underpins model leakage risks. Our analysis revealed several distillation traps: tail noise, off-policy instability, and, most fundamentally, the teacher-student gap, that distort training signals. These traps manifest as overconfident hallucinations, self-correction collapse, and local decoding degradat

ArXiv AI

Research

Self-Improving Tabular Language Models via Iterative Group Alignment

arXiv:2604.18966v1 Announce Type: cross Abstract: While language models have been adapted for tabular data generation, two fundamental limitations remain: (1) static fine-tuning produces models that cannot learn from their own generated samples and adapt to self-correct, and (2) autoregressive objectives preserve local token coherence but neglect global statistical properties, degrading tabular quality. Reinforcement learning offers a potential solution but requires designing reward functions th

ArXiv AI

Research

Low-Rank Adaptation for Critic Learning in Off-Policy Reinforcement Learning

arXiv:2604.18978v1 Announce Type: cross Abstract: Scaling critic capacity is a promising direction for enhancing off-policy reinforcement learning (RL). However, larger critics are prone to overfitting and unstable in replay-buffer-based bootstrap training. This paper leverages Low-Rank Adaptation (LoRA) as a structural-sparsity regularizer for off-policy critics. Our approach freezes randomly initialized base matrices and solely optimizes low-rank adapters, thereby constraining critic updates t

ArXiv AI

Research

AutoAWG: Adverse Weather Generation with Adaptive Multi-Controls for Automotive Videos

arXiv:2604.18993v1 Announce Type: cross Abstract: Perception robustness under adverse weather remains a critical challenge for autonomous driving, with the core bottleneck being the scarcity of real-world video data in adverse weather. Existing weather generation approaches struggle to balance visual quality and annotation reusability. We present AutoAWG, a controllable Adverse Weather video Generation framework for Autonomous driving. Our method employs a semantics-guided adaptive fusion of mul

ArXiv AI

Research

$R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction

arXiv:2604.18995v1 Announce Type: cross Abstract: Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to autoregressive generation by enabling parallel token prediction. However, practical dLLM decoding still suffers from high inference latency, which limits deployment. In this work, we observe that a substantial part of this inefficiency comes from recurring redundancy in the decoding process, including spatial redundancy caused by confidence clusters and positional

ArXiv AI

Research

Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control

arXiv:2604.19018v1 Announce Type: cross Abstract: Inference-time LLM alignment methods, particularly activation steering, offer an alternative to fine-tuning by directly modifying activations during generation. Existing methods, however, often rely on non-anticipative interventions that ignore how perturbations propagate through transformer layers and lack online error feedback, resulting in suboptimal, open-loop control. To address this, we show empirically that, despite the nonlinear structure

ArXiv AI

Research

RARE: Redundancy-Aware Retrieval Evaluation Framework for High-Similarity Corpora

arXiv:2604.19047v1 Announce Type: cross Abstract: Existing QA benchmarks typically assume distinct documents with minimal overlap, yet real-world retrieval-augmented generation (RAG) systems operate on corpora such as financial reports, legal codes, and patents, where information is highly redundant and documents exhibit strong inter-document similarity. This mismatch undermines evaluation validity: retrievers can be unfairly undervalued even when they retrieve documents that provide sufficient

ArXiv AI

Research

SAMoRA: Semantic-Aware Mixture of LoRA Experts for Task-Adaptive Learning

arXiv:2604.19048v1 Announce Type: cross Abstract: The combination of Mixture-of-Experts (MoE) and Low-Rank Adaptation (LoRA) has shown significant potential for enhancing the multi-task learning capabilities of Large Language Models. However, existing methods face two primary challenges: (1)Imprecise Routing in the current MoE-LoRA method fails to explicitly match input semantics with expert capabilities, leading to weak expert specialization. (2)Uniform weight fusion strategies struggle to prov

ArXiv AI

Research

Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Review Methodology for High-Precision LLM-Assisted Defect Discovery

arXiv:2604.19049v1 Announce Type: cross Abstract: LLM-assisted defect discovery has a precision crisis: plausible-but-wrong reports overwhelm maintainers and degrade credibility for real findings. We present Refute-or-Promote, an inference-time reliability pattern combining Stratified Context Hunting (SCH) for candidate generation, adversarial kill mandates, context asymmetry, and a Cross-Model Critic (CMC). Adversarial agents attempt to disprove candidates at each promotion gate; cold-start rev

ArXiv AI

Research

Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization

arXiv:2604.19079v1 Announce Type: cross Abstract: Unification of automatic speech recognition (ASR) systems reduces development and maintenance costs, but training a single model to perform well in both offline and low-latency streaming settings remains challenging. We present a Unified ASR framework for Transducer (RNNT) training that supports both offline and streaming decoding within a single model, using chunk-limited attention with right context and dynamic chunked convolutions. To further

ArXiv AI

Research

ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety

arXiv:2604.19083v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) have achieved remarkable success in cross-modal understanding and generation, yet their deployment is threatened by critical safety vulnerabilities. While prior works have demonstrated the feasibility of backdoors in MLLMs via fine-tuning data poisoning to manipulate inference, the underlying mechanisms of backdoor attacks remain opaque, complicating the understanding and mitigation. To bridge this gap, we

ArXiv AI

Research

RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation

arXiv:2604.19092v1 Announce Type: cross Abstract: Recent advances in large-scale video world models have enabled increasingly realistic future prediction, raising the prospect of leveraging imagined videos for robot learning. However, visual realism does not imply physical plausibility, and behaviors inferred from generated videos may violate dynamics and fail when executed by embodied agents. Existing benchmarks begin to incorporate notions of physical plausibility, but they largely remain perc

ArXiv AI

Research

Multi-modal Test-time Adaptation via Adaptive Probabilistic Gaussian Calibration

arXiv:2604.19093v1 Announce Type: cross Abstract: Multi-modal test-time adaptation (TTA) enhances the resilience of benchmark multi-modal models against distribution shifts by leveraging the unlabeled target data during inference. Despite the documented success, the advancement of multi-modal TTA methodologies has been impeded by a persistent limitation, i.e., the lack of explicit modeling of category-conditional distributions, which is crucial for yielding accurate predictions and reliable deci

ArXiv AI

Research

Multi-Gait Learning for Humanoid Robots Using Reinforcement Learning with Selective Adversarial Motion Prior

arXiv:2604.19102v1 Announce Type: cross Abstract: Learning diverse locomotion skills for humanoid robots in a unified reinforcement learning framework remains challenging due to the conflicting requirements of stability and dynamic expressiveness across different gaits. We present a multi-gait learning approach that enables a humanoid robot to master five distinct gaits -- walking, goose-stepping, running, stair climbing, and jumping -- using a consistent policy structure, action space, and rewa

ArXiv AI

Research

Reinforcement Learning Enabled Adaptive Multi-Task Control for Bipedal Soccer Robots

arXiv:2604.19104v1 Announce Type: cross Abstract: Developing bipedal football robots in dynamiccombat environments presents challenges related to motionstability and deep coupling of multiple tasks, as well ascontrol switching issues between different states such as up-right walking and fall recovery. To address these problems,this paper proposes a modular reinforcement learning (RL)framework for achieving adaptive multi-task control. Firstly,this framework combines an open-loop feedforward osci

ArXiv AI

Research

Design Rules for Extreme-Edge Scientific Computing on AI Engines

arXiv:2604.19106v1 Announce Type: cross Abstract: Extreme-edge scientific applications use machine learning models to analyze sensor data and make real-time decisions. Their stringent latency and throughput requirements demand small batch sizes and require that model weights remain fully on-chip. Spatial dataflow implementations are common for extreme-edge applications. Spatial dataflow works well for small networks, but it fails to scale to larger models due to inherent resource scaling limitat

ArXiv AI

Funding

Bret Taylor’s Sierra buys YC-backed AI startup Fragment

Sierra, the AI customer service agent startup founded by technologist Bret Taylor, announced today that it has acquired the YC-backed French startup Fragment.

TechCrunch AI

Funding

Era raises $11M to build a software platform for AI gadgets

Era thinks that we will see many form factors of AI hardware, including glasses, rings, and pendants.

TechCrunch AI

Trend

Tesla just increased its spending plan to $25B — here’s where the money is going

Tesla's planned capex for 2026 is three times higher than what the company has historically spent. Its CFO said, as a result, Tesla will have a negative free cash flow the rest of the year.

TechCrunch AI

Product Launch

Hands on with X’s new AI-powered custom feeds

X's AI-powered custom timelines are replacing Communities, with Grok-curated feeds...and new ad slots.

TechCrunch AI

AI Tools

Google makes an interesting choice with its new agent-building tool for enterprises

Gemini Enterprise Agent Platform takes an interesting approach: It is geared for IT and technical users.

TechCrunch AI

Product Launch

AI Overviews are coming to your Gmail at work

The AI Overviews will offer instant summaries pulled from across multiple emails.

TechCrunch AI

Industry

OpenAI teams up with Infosys to bring AI tools to more businesses

Infosys said the integration will be used to help its clients modernize software development, automate workflows, and deploy AI systems, initially focusing software engineering, legacy modernization, and DevOps.

TechCrunch AI

AI Tools

Claude is connecting directly to your personal apps like Spotify, Uber Eats, and TurboTax

Claude users can access more apps with Anthropic's AI now thanks to new connectors for everything from hiking to grocery shopping. Anthropic already supported connecting numerous work-related apps to Claude, like Microsoft apps, but this expansion focuses on personal apps like Audible, Spotify, Uber, AllTrails, TripAdvisor, Instacart, TurboTax, and others. Some of these apps, such […]

The Verge AI

Trend

You’re about to feel the AI money squeeze

Earlier this month, millions of OpenClaw users woke up to a sweeping mandate: The viral AI agent tool, which this year took the worldwide tech industry by storm, had been severely restricted by Anthropic. Anthropic, like other leading AI labs, was under immense pressure to lessen the strain on its systems and start turning a […]

The Verge AI

Product Launch

Microsoft launches ‘vibe working’ in Word, Excel, and PowerPoint

Microsoft is rolling out a new Agent Mode inside Office apps like Word, Excel, and PowerPoint this week. Previously described by Microsoft as "vibe working," the Agent Mode is a more powerful version of the Copilot experience in Office that Microsoft has been trying to sell to businesses. "When we first shipped Copilot, foundation models […]

The Verge AI

Industry

Making ChatGPT better for clinicians

OpenAI makes ChatGPT for Clinicians free for verified U.S. physicians, nurse practitioners, and pharmacists, supporting clinical care, documentation, and research.

OpenAI Blog

AI Tools

The Pope’s Warnings About AI Were AI-Generated, a Detection Tool Claims

Pangram Labs’ updated Chrome extension puts warning labels on AI slop as you scroll your social feeds.

Wired AI

Product Launch

Anthropic's Claude Desktop App Installs Undisclosed Native Messaging Bridge

Article URL: https://letsdatascience.com/news/claude-desktop-installs-preauthorized-browser-extension-mani-4064fb1a Comments URL: https://news.ycombinator.com/item?id=47880697 Points: 82 # Comments: 16

Hacker News

Policy

Our newsroom AI policy

Article URL: https://arstechnica.com/staff/2026/04/our-newsroom-ai-policy/ Comments URL: https://news.ycombinator.com/item?id=47872452 Points: 181 # Comments: 124

Hacker News

Trend

Startups brag they spend more money on AI than human employees

Article URL: https://www.404media.co/startups-brag-they-spend-more-money-on-ai-than-human-employees/ Comments URL: https://news.ycombinator.com/item?id=47865923 Points: 53 # Comments: 47

Hacker News

Industry

Meta employees are up in arms over a mandatory program to train AI on their

Article URL: https://www.businessinsider.com/meta-new-ai-tool-tracks-staff-activity-sparks-concern-2026-4 Comments URL: https://news.ycombinator.com/item?id=47860961 Points: 115 # Comments: 89

Hacker News

Research

Beyond One Output: Visualizing and Comparing Distributions of Language Model Generations

arXiv:2604.18724v1 Announce Type: new Abstract: Users typically interact with and evaluate language models via single outputs, but each output is just one sample from a broad distribution of possible completions. This interaction hides distributional structure such as modes, uncommon edge cases, and sensitivity to small prompt changes, leading users to over-generalize from anecdotes when iterating on prompts for open-ended tasks. Informed by a formative study with researchers who use LMs (n=13)

ArXiv AI

Research

AI scientists produce results without reasoning scientifically

arXiv:2604.18805v1 Announce Type: new Abstract: Large language model (LLM)-based systems are increasingly deployed to conduct scientific research autonomously, yet whether their reasoning adheres to the epistemic norms that make scientific inquiry self-correcting is poorly understood. Here, we evaluate LLM-based scientific agents across eight domains, spanning workflow execution to hypothesis-driven inquiry, through more than 25,000 agent runs and two complementary lenses: (i) a systematic perfo

ArXiv AI

Research

Quantum inspired qubit qutrit neural networks for real time financial forecasting

arXiv:2604.18838v1 Announce Type: new Abstract: This research investigates the performance and efficacy of machine learning models in stock prediction, comparing Artificial Neural Networks (ANNs), Quantum Qubit-based Neural Networks (QQBNs), and Quantum Qutrit-based Neural Networks (QQTNs). By outlining methodologies, architectures, and training procedures, the study highlights significant differences in training times and performance metrics across models. While all models demonstrate robust ac

ArXiv AI

Research

Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training

arXiv:2604.18701v1 Announce Type: cross Abstract: Local prediction-error-based curiosity rewards focus on the current transition without considering the world model's cumulative prediction error across all visited transitions. We introduce Curiosity-Critic, which grounds its intrinsic reward in the improvement of this cumulative objective, and show that it reduces to a tractable per-step form: the difference between the current prediction error and the asymptotic error baseline of the current st

ArXiv AI

Research

REVEAL: Multimodal Vision-Language Alignment of Retinal Morphometry and Clinical Risks for Incident AD and Dementia Prediction

arXiv:2604.18757v1 Announce Type: cross Abstract: The retina provides a unique, noninvasive window into Alzheimer's disease (AD) and dementia, capturing early structural changes through morphometric features, while systemic and lifestyle risk factors reflect well-established contributors to disease susceptibility long before clinical symptom onset. However, current retinal analysis frameworks typically model imaging and risk factors separately, limiting their ability to capture joint multimodal

ArXiv AI

Policy

AI failure could trigger the next financial crisis, warns Elizabeth Warren

"I know a bubble when I see one." That's what Sen. Elizabeth Warren (D-MA), who led the push to create a new consumer financial regulator in the wake of the 2008 recession, told a crowd at a Vanderbilt Policy Accelerator event in Washington, DC, on Wednesday. Warren warned of what she called "striking" parallels to […]

The Verge AI

Research

GPT-5.5 Bio Bug Bounty

Explore the GPT-5.5 Bio Bug Bounty: a red-teaming challenge to find universal jailbreaks for bio safety risks, with rewards up to $25,000.

OpenAI Blog

Trend

In a first, a ransomware family is confirmed to be quantum-safe

Technically speaking, there's no practical benefit to use PQC. So why is it being used?

Ars Technica AI

AI Tools

Microsoft issues emergency update for macOS and Linux ASP.NET threat

When authentication fails, things can go very, very wrong.

Ars Technica AI

Research

From Natural Language to Executable Narsese: A Neuro-Symbolic Benchmark and Pipeline for Reasoning with NARS

arXiv:2604.18873v1 Announce Type: new Abstract: Large language models (LLMs) are highly capable at language generation, but they remain unreliable when reasoning requires explicit symbolic structure, multi-step inference, and interpretable uncertainty. This paper presents a neuro-symbolic framework for translating natural-language reasoning problems into executable formal representations using first-order logic (FOL) and Narsese, the language of the Non-Axiomatic Reasoning System (NARS). To supp

ArXiv AI

Research

Formally Verified Patent Analysis via Dependent Type Theory: Machine-Checkable Certificates from a Hybrid AI + Lean 4 Pipeline

arXiv:2604.18882v1 Announce Type: new Abstract: We present a formally verified framework for patent analysis as a hybrid AI + Lean 4 pipeline. The DAG-coverage core (Algorithm 1b) is fully machine-verified once bounded match scores are fixed. Freedom-to-operate, claim-construction sensitivity, cross-claim consistency, and doctrine-of-equivalents analyses are formalized at the specification level with kernel-checked candidate certificates. Existing patent-analysis approaches rely on manual expert

ArXiv AI

AI Tools

On Accelerating Grounded Code Development for Research

arXiv:2604.19022v1 Announce Type: new Abstract: A major challenge for niche scientific and technical domains in leveraging coding agents is the lack of access to up-to-date, domain- specific knowledge. Foundational models often demonstrate limited reasoning capabilities in specialized fields and cannot inherently incorporate knowledge that evolves through ongoing research and experimentation. Materials scientists exploring novel compounds, communication engineers designing and evaluating new pro

ArXiv AI

Research

Learning Lifted Action Models from Unsupervised Visual Traces

arXiv:2604.19043v1 Announce Type: new Abstract: Efficient construction of models capturing the preconditions and effects of actions is essential for applying AI planning in real-world domains. Extensive prior work has explored learning such models from high-level descriptions of state and/or action sequences. In this paper, we tackle a more challenging setting: learning lifted action models from sequences of state images, without action observation. We propose a deep learning framework that join

ArXiv AI

AI Tools

Has Automated Essay Scoring Reached Sufficient Accuracy? Deriving Achievable QWK Ceilings from Classical Test Theory

arXiv:2604.19131v1 Announce Type: new Abstract: Automated essay scoring (AES) is commonly evaluated on public benchmarks using quadratic weighted kappa (QWK). However, because benchmark labels are assigned by human raters and inevitably contain scoring errors, it remains unclear both what QWK is theoretically attainable and what level is practically sufficient for deployment. We therefore derive two dataset-specific QWK ceilings based on the reliability concept in classical test theory, which ca

ArXiv AI

Research

Industrial Surface Defect Detection via Diffusion Generation and Asymmetric Student-Teacher Network

arXiv:2604.19240v1 Announce Type: new Abstract: Industrial surface defect detection often suffers from limited defect samples, severe long-tailed distributions, and difficulties in accurately localizing subtle defects under complex backgrounds. To address these challenges, this paper proposes an unsupervised defect detection method that integrates a Denoising Diffusion Probabilistic Model (DDPM) with an asymmetric teacher-student architecture. First, at the data level, the DDPM is trained solely

ArXiv AI

Research

Revac: A Social Deduction Reasoning Agent

arXiv:2604.19523v1 Announce Type: new Abstract: Social deduction games such as Mafia present a unique AI challenge: players must reason under uncertainty, interpret incomplete and intentionally misleading information, evaluate human-like communication, and make strategic elimination decisions. Unlike deterministic board games, success in Mafia depends not on perfect information or brute-force search, but on inference, memory, and adaptability in the presence of deception. This work presents the

ArXiv AI

Research

Enhancing Construction Worker Safety in Extreme Heat: A Machine Learning Approach Utilizing Wearable Technology for Predictive Health Analytics

arXiv:2604.19559v1 Announce Type: new Abstract: Construction workers are highly vulnerable to heat stress, yet tools that translate real-time physiological data into actionable safety intelligence remain scarce. This study addresses this gap by developing and evaluating deep learning models, specifically a baseline Long Short-Term Memory (LSTM) network and an attention-based LSTM, to predict heat stress among 19 workers in Saudi Arabia. Using Garmin Vivosmart 5 smartwatches to monitor metrics su

ArXiv AI

Research

AblateCell: A Reproduce-then-Ablate Agent for Virtual Cell Repositories

arXiv:2604.19606v1 Announce Type: new Abstract: Systematic ablations are essential to attribute performance gains in AI Virtual Cells, yet they are rarely performed because biological repositories are under-standardized and tightly coupled to domain-specific data and formats. While recent coding agents can translate ideas into implementations, they typically stop at producing code and lack a verifier that can reproduce strong baselines and rigorously test which components truly matter. We introd

ArXiv AI

Research

A Dual Perspective on Synthetic Trajectory Generators: Utility Framework and Privacy Vulnerabilities

arXiv:2604.19653v1 Announce Type: new Abstract: Human mobility data are used in numerous applications, ranging from public health to urban planning. Human mobility is inherently sensitive, as it can contain information such as religious beliefs and political affiliations. Historically, it has been proposed to modify the information using techniques such as aggregation, obfuscation, or noise addition, to adequately protect privacy and eliminate concerns. As these methods come at a great cost in u

ArXiv AI

Research

Who Shapes Brazil's Vaccine Debate? Semi-Supervised Modeling of Stance and Polarization in YouTube's Media Ecosystem

arXiv:2604.18586v1 Announce Type: cross Abstract: Vaccination remains a cornerstone of global public health, yet the COVID-19 pandemic exposed how online misinformation, political polarization, and declining institutional trust can undermine immunization efforts. Most of the prior computational studies that analyzed vaccine discourse on social platforms focus on English-language data, specific vaccines, or short time windows, impairing our understanding of long-term dynamics in high-impact, non-

ArXiv AI

Research

Thermal Anomaly Detection using Physics Aware Neuromorphic Networks: Comparison between Raw and L1C Sentinel-2 Data

arXiv:2604.18606v1 Announce Type: cross Abstract: Damage caused by bushfires and volcanic eruptions escalates rapidly when detection is delayed, making fast and reliable early warning capabilities essential. Recent Earth Observation (EO) approaches have shown that thermal anomaly detection can be performed directly on decompressed Level-0 (L0) sensor data, avoiding computationally expensive preprocessing chains. However, direct exploitation of raw data remains challenging due to domain shift, se

ArXiv AI

Research

Evaluating Answer Leakage Robustness of LLM Tutors against Adversarial Student Attacks

arXiv:2604.18660v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used in education, yet their default helpfulness often conflicts with pedagogical principles. Prior work evaluates pedagogical quality via answer leakage-the disclosure of complete solutions instead of scaffolding-but typically assumes well-intentioned learners, leaving tutor robustness under student misuse largely unexplored. In this paper, we study scenarios where students behave adversarially and a

ArXiv AI

Research

The Cost of Relaxation: Evaluating the Error in Convex Neural Network Verification

arXiv:2604.18728v1 Announce Type: cross Abstract: Many neural network (NN) verification systems represent the network's input-output relation as a constraint program. Sound and complete, representations involve integer constraints, for simulating the activations. Recent works convexly relax the integer constraints, improving performance, at the cost of soundness. Convex relaxations consider outputs that are unreachable by the original network. We study the worst case divergence between the origi

ArXiv AI

Research

Beyond Coefficients: Forecast-Necessity Testing for Interpretable Causal Discovery in Nonlinear Time-Series Models

arXiv:2604.18751v1 Announce Type: cross Abstract: Nonlinear machine-learning models are increasingly used to discover causal relationships in time-series data, yet the interpretation of their outputs remains poorly understood. In particular, causal scores produced by regularized neural autoregressive models are often treated as analogues of regression coefficients, leading to misleading claims of statistical significance. In this paper, we argue that causal relevance in nonlinear time-series mod

ArXiv AI

Research

Multi-Level Temporal Graph Networks with Local-Global Fusion for Industrial Fault Diagnosis

arXiv:2604.18765v1 Announce Type: cross Abstract: Fault detection and diagnosis are critical for the optimal and safe operation of industrial processes. The correlations among sensors often display non-Euclidean structures where graph neural networks (GNNs) are widely used therein. However, for large-scale systems, local, global, and dynamic relations extensively exist among sensors, and traditional GNNs often overlook such complex and multi-level structures for various problems including the fa

ArXiv AI

Research

Experiments or Outcomes? Probing Scientific Feasibility in Large Language Models

arXiv:2604.18786v1 Announce Type: cross Abstract: Scientific feasibility assessment asks whether a claim is consistent with established knowledge and whether experimental evidence could support or refute it. We frame feasibility assessment as a diagnostic reasoning task in which, given a hypothesis, a model predicts feasible or infeasible and justifies its decision. We evaluate large language models (LLMs) under controlled knowledge conditions (hypothesis-only, with experiments, with outcomes, o

ArXiv AI

Research

Geometric Decoupling: Diagnosing the Structural Instability of Latent

arXiv:2604.18804v1 Announce Type: cross Abstract: Latent Diffusion Models (LDMs) achieve high-fidelity synthesis but suffer from latent space brittleness, causing discontinuous semantic jumps during editing. We introduce a Riemannian framework to diagnose this instability by analyzing the generative Jacobian, decomposing geometry into \textit{Local Scaling} (capacity) and \textit{Local Complexity} (curvature). Our study uncovers a \textbf{``Geometric Decoupling"}: while curvature in normal gener

ArXiv AI

Research

Curvature-Aware PCA with Geodesic Tangent Space Aggregation for Semi-Supervised Learning

arXiv:2604.18816v1 Announce Type: cross Abstract: Principal Component Analysis (PCA) is a fundamental tool for representation learning, but its global linear formulation fails to capture the structure of data supported on curved manifolds. In contrast, manifold learning methods model nonlinearity but often sacrifice the spectral structure and stability of PCA. We propose \emph{Geodesic Tangent Space Aggregation PCA (GTSA-PCA)}, a geometric extension of PCA that integrates curvature awareness and

ArXiv AI

Research

Semantic Needles in Document Haystacks: Sensitivity Testing of LLM-as-a-Judge Similarity Scoring

arXiv:2604.18835v1 Announce Type: cross Abstract: We propose a scalable, multifactorial experimental framework that systematically probes LLM sensitivity to subtle semantic changes in pairwise document comparison. We analogize this as a needle-in-a-haystack problem: a single semantically altered sentence (the needle) is embedded within surrounding context (the hay), and we vary the perturbation type (negation, conjunction swap, named entity replacement), context type (original vs. topically unre

ArXiv AI

Research

Human-Machine Co-Boosted Bug Report Identification with Mutualistic Neural Active Learning

arXiv:2604.18862v1 Announce Type: cross Abstract: Bug reports, encompassing a wide range of bug types, are crucial for maintaining software quality. However, the increasing complexity and volume of bug reports pose a significant challenge in sole manual identification and assignment to the appropriate teams for resolution, as dealing with all the reports is time-consuming and resource-intensive. In this paper, we introduce a cross-project framework, dubbed Mutualistic Neural Active Learning (MNA

ArXiv AI

Research

A Proxy Consistency Loss for Grounded Fusion of Earth Observation and Location Encoders

arXiv:2604.18881v1 Announce Type: cross Abstract: Supervised learning with Earth observation inputs is often limited by the sparsity of high-quality labeled or in-situ measured data to use as training labels. With the abundance of geographic data products, in many cases there are variables correlated with - but different from - the variable of interest that can be leveraged. We integrate such proxy variables within a geographic prior via a trainable location encoder and introduce a proxy consist

ArXiv AI

Research

MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation

arXiv:2604.18914v1 Announce Type: cross Abstract: While multilingual large language models (LLMs) perform well on high-level tasks like translation and question answering, their ability to handle grammatical gender and morphological agreement remains underexplored. In morphologically rich languages, gender influences verb conjugation, pronouns, and even first-person constructions with explicit and implicit mentions of gender. We introduce MORPHOGEN, a morphologically grounded large-scale benchma

ArXiv AI

Research

Fine-Tuning Small Reasoning Models for Quantum Field Theory

arXiv:2604.18936v1 Announce Type: cross Abstract: Despite the growing application of Large Language Models (LLMs) to theoretical physics, there is little academic exploration into how domain-specific physics reasoning ability develops while training these models. To investigate this, we perform the first academic fine-tuning study of small (7B-parameter) reasoning models dedicated specifically to theoretical physics. Because open-source verifiable training data required to train such capabilitie

ArXiv AI

Research

Decompose, Structure, and Repair: A Neuro-Symbolic Framework for Autoformalization via Operator Trees

arXiv:2604.19000v1 Announce Type: cross Abstract: Statement autoformalization acts as a critical bridge between human mathematics and formal mathematics by translating natural language problems into formal language. While prior works have focused on data synthesis and diverse training paradigms to optimize end-to-end Large Language Models (LLMs), they typically treat formal code as flat sequences, neglecting the hierarchical logic inherent in mathematical statements. In this work, we introduce D

ArXiv AI

Research

Intentional Updates for Streaming Reinforcement Learning

arXiv:2604.19033v1 Announce Type: cross Abstract: In gradient-based learning, a step size chosen in parameter units does not produce a predictable per-step change in function output. This often leads to instability in the streaming setting (i.e., batch size=1), where stochasticity is not averaged out and update magnitudes can momentarily become arbitrarily big or small. Instead, we propose intentional updates: first specify the intended outcome of an update and then solve for the step size that

ArXiv AI

Research

Product-of-Experts Training Reduces Dataset Artifacts in Natural Language Inference

arXiv:2604.19069v1 Announce Type: cross Abstract: Neural NLI models overfit dataset artifacts instead of truly reasoning. A hypothesis-only model gets 57.7% in SNLI, showing strong spurious correlations, and 38.6% of the baseline errors are the result of these artifacts. We propose Product-of-Experts (PoE) training, which downweights examples where biased models are overconfident. PoE nearly preserves accuracy (89.10% vs. 89.30%) while cutting bias reliance by 4.71% (bias agreement 49.85% to 45%

ArXiv AI

Research

S2MAM: Semi-supervised Meta Additive Model for Robust Estimation and Variable Selection

arXiv:2604.19072v1 Announce Type: cross Abstract: Semi-supervised learning with manifold regularization is a classical framework for jointly learning from both labeled and unlabeled data, where the key requirement is that the support of the unknown marginal distribution has the geometric structure of a Riemannian manifold. Typically, the Laplace-Beltrami operator-based manifold regularization can be approximated empirically by the Laplacian regularization associated with the entire training data

ArXiv AI

Research

SAHM: A Benchmark for Arabic Financial and Shari'ah-Compliant Reasoning

arXiv:2604.19098v1 Announce Type: cross Abstract: English financial NLP has progressed rapidly through benchmarks for sentiment, document understanding, and financial question answering, while Arabic financial NLP remains comparatively under-explored despite strong practical demand for trustworthy finance and Islamic-finance assistants. We introduce SAHM, a document-grounded benchmark and instruction-tuning dataset for Arabic financial NLP and Shari'ah-compliant reasoning. SAHM contains 14,380 e

ArXiv AI

Trend

AI galaxy hunters are adding to the global GPU crunch

Astronomers are turning to GPUs to find needles in the galactic haystack.

TechCrunch AI

Trend

India’s app market is booming — but global platforms are capturing most of the gains

Non-gaming apps, led by streaming and AI, are driving growth, even as India's spending per user lags global peers.

TechCrunch AI

Research

Anthropic’s Mythos breach was humiliating

Anthropic's tightly controlled rollout of Claude Mythos has taken an awkward turn. After spending weeks insisting the AI model is so capable at cybersecurity that it is too dangerous to release publicly, it appears the model fell into the wrong hands anyway. According to Bloomberg, a "small group of unauthorized users" has had access to […]

The Verge AI

AI Tools

What is Codex?

Learn how Codex helps you go beyond chat by automating tasks, connecting tools, and producing real outputs like docs and dashboards.

OpenAI Blog

AI Tools

Automations

Learn how to automate tasks in Codex using schedules and triggers to create reports, summaries, and recurring workflows without manual effort.

OpenAI Blog

AI Tools

Plugins and skills

Learn how to use Codex plugins and skills to connect tools, access data, and follow repeatable workflows to automate tasks and improve results.

OpenAI Blog

AI Tools

Working with Codex

Learn how to set up your Codex workspace, create threads and projects, manage files, and start completing tasks with step-by-step guidance.

OpenAI Blog

AI Tools

How to get started with Codex

Learn how to get started with Codex by setting up projects, creating threads, and completing your first tasks with step-by-step guidance.

OpenAI Blog

AI Tools

Codex settings

Learn how to configure Codex settings, including personalization, detail level, and permissions, to run tasks smoothly and customize your workflow.

OpenAI Blog

AI Tools

Top 10 uses for Codex at work

Explore 10 practical Codex use cases to automate tasks, create deliverables, and turn real inputs into outputs across tools, files, and workflows.

OpenAI Blog

Product Launch

Gemma 4 VLA Demo on Jetson Orin Nano Super

Hugging Face Blog

Open Source

MeshCore development team splits over trademark dispute and AI-generated code

Article URL: https://blog.meshcore.io/2026/04/23/the-split Comments URL: https://news.ycombinator.com/item?id=47878117 Points: 139 # Comments: 82

Hacker News

Trend

Top MAGA influencer revealed to be AI

Article URL: https://nypost.com/2026/04/21/us-news/top-maga-influencer-emily-hart-revealed-to-be-ai-created-by-a-guy-in-india/ Comments URL: https://news.ycombinator.com/item?id=47864808 Points: 96 # Comments: 54

Hacker News

Research

On Solving the Multiple Variable Gapped Longest Common Subsequence Problem

arXiv:2604.18645v1 Announce Type: new Abstract: This paper addresses the Variable Gapped Longest Common Subsequence (VGLCS) problem, a generalization of the classical LCS problem involving flexible gap constraints between consecutive solutions' characters. The problem arises in molecular sequence comparison, where structural distance constraints between residues must be respected, and in time-series analysis where events are required to occur within specified temporal delays. We propose a search

ArXiv AI

Research

Characterizing AlphaEarth Embedding Geometry for Agentic Environmental Reasoning

arXiv:2604.18715v1 Announce Type: cross Abstract: Earth observation foundation models encode land surface information into dense embedding vectors, yet the geometric structure of these representations and its implications for downstream reasoning remain underexplored. We characterize the manifold geometry of Google AlphaEarth's 64-dimensional embeddings across 12.1 million Continental United States samples (2017--2023) and develop an agentic system that leverages this geometric understanding for

ArXiv AI

Research

Skillful Global Ocean Emulation and the Role of Correlation-Aware Loss

arXiv:2604.18727v1 Announce Type: cross Abstract: Machine learning emulators have shown extraordinary skill in forecasting atmospheric states, and their application to global ocean dynamics offers similar promise. Here, we adapt the GraphCast architecture into a dedicated ocean-only emulator, driven by prescribed atmospheric conditions, for medium-range predictions. The emulator is trained on NOAA's UFS-Replay dataset. Using a 24 hour time step, single initial condition, and without using autore

ArXiv AI

Education

At 'AI Coachella,' Stanford Students Line Up to Learn From Silicon Valley Royalty

CS 153 has gone viral on the Palo Alto campus—and on X. Not everyone is happy about it.

Wired AI

Research

Error-free Training for MedMNIST Datasets

arXiv:2604.18916v1 Announce Type: new Abstract: In this paper, we introduce a new concept called Artificial Special Intelligence by which Machine Learning models for the classification problem can be trained error-free, thus acquiring the capability of not making repeated mistakes. The method is applied to 18 MedMNIST biomedical datasets. Except for three datasets, which suffer from the double-labeling problem, all are trained to perfection.

ArXiv AI

Research

Plausible Reasoning and First-Order Plausible Logic

arXiv:2604.19036v1 Announce Type: new Abstract: Defeasible statements are statements that are likely, or probable, or usually true, but may occasionally be false. Plausible reasoning makes conclusions from statements that are either facts or defeasible statements without using numbers. So there are no probabilities or suchlike involved. Seventeen principles of logics that do plausible reasoning are suggested and several important plausible reasoning examples are considered. There are 14 necessar

ArXiv AI

Research

Towards Energy Impact on AI-Powered 6G IoT Networks: Centralized vs. Decentralized

arXiv:2604.19377v1 Announce Type: new Abstract: The emergence of sixth-generation (6G) technologies has introduced new challenges and opportunities for machine learning (ML) applications in Internet of Things (IoT) networks, particularly concerning energy efficiency. As model training and data transmission contribute significantly to energy consumption, optimizing these processes has become critical for sustainable system design. This study first conduct analysis on the energy consumption model

ArXiv AI

Research

The Triadic Loop: A Framework for Negotiating Alignment in AI Co-hosted Livestreaming

arXiv:2604.18850v1 Announce Type: cross Abstract: AI systems are increasingly embedded in multi-user social environments, yet most alignment frameworks conceptualize interaction as a dyadic relationship between a single user and an AI system. Livestreaming platforms challenge this assumption: interaction unfolds among streamers and audiences in real time, producing dynamic affective and social feedback loops. In this paper, we introduce the Triadic Loop, a conceptual framework that reconceptuali

ArXiv AI

Research

Tadabur: A Large-Scale Quran Audio Dataset

arXiv:2604.18932v1 Announce Type: cross Abstract: Despite growing interest in Quranic data research, existing Quran datasets remain limited in both scale and diversity. To address this gap, we present Tadabur, a large-scale Quran audio dataset. Tadabur comprises more than 1400+ hours of recitation audio from over 600 distinct reciters, providing substantial variation in recitation styles, vocal characteristics, and recording conditions. This diversity makes Tadabur a comprehensive and representa

ArXiv AI

Research

Relational AI in Education: Reciprocity, Participatory Design, and Indigenous Worldviews

arXiv:2604.19099v1 Announce Type: cross Abstract: Education is not merely the transmission of information or the optimisation of individual performance; it is a fundamentally social, constructive, and relational practice. However, recent advances in generative artificial intelligence (GenAI) increasingly emphasise efficiency, automation, and individualised assistance, risking the weakening of relational learning processes. Despite growing adoption, AI in education (AIED) research has yet to full

ArXiv AI

Industry

Another customer of troubled startup Delve suffered a big security incident

TechCrunch has confirmed that Delve was the compliance company that performed the security certifications for Context AI, the AI agent training startup that last week disclosed a security incident.

TechCrunch AI

AI Tools

Scoring Show HN submissions for AI design patterns

Article URL: https://www.adriankrebs.ch/blog/design-slop/ Comments URL: https://news.ycombinator.com/item?id=47864393 Points: 326 # Comments: 231

Hacker News

Research

Watch Sony’s elite ping-pong robot beat top-ranked players

Humans have been building ping-pong playing robots for decades, such as Omron's FORPHEUS that challenged amateur competitors at CES 2017. What sets Ace apart from the rest is that the robot, which was developed by Sony's AI division, is the first that can hold its own against top-ranked human players and occasionally even beat them […]

The Verge AI

AI Tools

Meet Noscroll, an AI bot that does your doomscrolling for you

Noscroll wants to cure doomscrolling with an AI bot that reads the internet for you.

TechCrunch AI

Get AI Pulse every morning

5 stories. 5 minutes. Personalized for your role. Free forever.

Open AI Pulse