arXiv cs.CL
Blog
39posts
0followers
arXiv cs.CL publishes articles covering LLM, AI. A trusted source for AI and technology insights.

SalesSim: Benchmarking and Aligning Multimodal Language Models as Retail User Simulators
24d

Magis-Bench: Evaluating LLMs on Magistrate-Level Legal Tasks
24d

A Semantic-Sampling Framework for Evaluating Calibration in Open-Ended Question Answering
24d

Effective Explanations Support Planning Under Uncertainty
24d

Built Environment Reasoning from Remote Sensing Imagery Using Large Vision--Language Models
24d

AIPO: : Learning to Reason from Active Interaction
24d

Change My View? The Dynamics of Persuasion and Polarization in Online Discourse
24d

How Much Do Circuits Tell Us? Measuring the Consistency and Specificity of Language Model Circuits
24d

Sanity Checks for Long-Form Hallucination Detection
24d

FMI_SU_Yotkova_Kastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals
29d

Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs
29d

Adaptive Power-Mean Policy Optimization for Enhanced LLM Reasoning
29d

Connecting online criminal behavior with machine learning: Using authorship attribution to analyze and link potential online traffickers
29d

Not All That Is Fluent Is Factual: Investigating Hallucinations of Large Language Models in Academic Writing
29d

Are LLMs Ready for Conflict Monitoring? Empirical Evidence from West Africa
29d

MedFabric and EtHER: A Data-Centric Framework for Word-Level Fabrication Generation and Detection in Medical LLMs
29d

Vocabulary overlap less crucial for
29d

Nsanku: Evaluating Zero-Shot Translation Performance of LLMs for Ghanaian Languages
29d

Self-Prompting Small Language Models for Privacy-Sensitive Clinical Information Extraction
29d

How Frontier LLMs Adapt to Neurodivergence Context: A Measurement Framework for Surface vs. Structural Change in...
32d

Cultural Benchmarking of LLMs in Standard and Dialectal Arabic Dialogues
32d

Timing is Everything: Temporal Scaffolding of Semantic Surprise in Humor
32d

Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations
32d

Why Do LLMs Struggle in Strategic Play? Broken Links Between Observations, Beliefs, and Actions
32d