arXiv cs.CL
arXiv cs.CL

arXiv cs.CL

Blog

39posts
0followers

arXiv cs.CL publishes articles covering LLM, AI. A trusted source for AI and technology insights.

SalesSim: Benchmarking and Aligning Multimodal Language Models as Retail User Simulators

SalesSim: Benchmarking and Aligning Multimodal Language Models as Retail User Simulators

24d

Magis-Bench: Evaluating LLMs on Magistrate-Level Legal Tasks

Magis-Bench: Evaluating LLMs on Magistrate-Level Legal Tasks

24d

A Semantic-Sampling Framework for Evaluating Calibration in Open-Ended Question Answering

A Semantic-Sampling Framework for Evaluating Calibration in Open-Ended Question Answering

24d

Effective Explanations Support Planning Under Uncertainty

Effective Explanations Support Planning Under Uncertainty

24d

Built Environment Reasoning from Remote Sensing Imagery Using Large Vision--Language Models

Built Environment Reasoning from Remote Sensing Imagery Using Large Vision--Language Models

24d

AIPO: : Learning to Reason from Active Interaction

AIPO: : Learning to Reason from Active Interaction

24d

Change My View? The Dynamics of Persuasion and Polarization in Online Discourse

Change My View? The Dynamics of Persuasion and Polarization in Online Discourse

24d

How Much Do Circuits Tell Us? Measuring the Consistency and Specificity of Language Model Circuits

How Much Do Circuits Tell Us? Measuring the Consistency and Specificity of Language Model Circuits

24d

Sanity Checks for Long-Form Hallucination Detection

Sanity Checks for Long-Form Hallucination Detection

24d

FMI_SU_Yotkova_Kastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals

FMI_SU_Yotkova_Kastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals

29d

Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs

Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs

29d

Adaptive Power-Mean Policy Optimization for Enhanced LLM Reasoning

Adaptive Power-Mean Policy Optimization for Enhanced LLM Reasoning

29d

Connecting online criminal behavior with machine learning: Using authorship attribution to analyze and link potential online traffickers

Connecting online criminal behavior with machine learning: Using authorship attribution to analyze and link potential online traffickers

29d

Not All That Is Fluent Is Factual: Investigating Hallucinations of Large Language Models in Academic Writing

Not All That Is Fluent Is Factual: Investigating Hallucinations of Large Language Models in Academic Writing

29d

Are LLMs Ready for Conflict Monitoring? Empirical Evidence from West Africa

Are LLMs Ready for Conflict Monitoring? Empirical Evidence from West Africa

29d

MedFabric and EtHER: A Data-Centric Framework for Word-Level Fabrication Generation and Detection in Medical LLMs

MedFabric and EtHER: A Data-Centric Framework for Word-Level Fabrication Generation and Detection in Medical LLMs

29d

Vocabulary overlap less crucial for

Vocabulary overlap less crucial for

29d

Nsanku: Evaluating Zero-Shot Translation Performance of LLMs for Ghanaian Languages

Nsanku: Evaluating Zero-Shot Translation Performance of LLMs for Ghanaian Languages

29d

Self-Prompting Small Language Models for Privacy-Sensitive Clinical Information Extraction

Self-Prompting Small Language Models for Privacy-Sensitive Clinical Information Extraction

29d

How Frontier LLMs Adapt to Neurodivergence Context: A Measurement Framework for Surface vs. Structural Change in...

How Frontier LLMs Adapt to Neurodivergence Context: A Measurement Framework for Surface vs. Structural Change in...

32d

Cultural Benchmarking of LLMs in Standard and Dialectal Arabic Dialogues

Cultural Benchmarking of LLMs in Standard and Dialectal Arabic Dialogues

32d

Timing is Everything: Temporal Scaffolding of Semantic Surprise in Humor

Timing is Everything: Temporal Scaffolding of Semantic Surprise in Humor

32d

Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations

Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations

32d

Why Do LLMs Struggle in Strategic Play? Broken Links Between Observations, Beliefs, and Actions

Why Do LLMs Struggle in Strategic Play? Broken Links Between Observations, Beliefs, and Actions

32d