Agent 最新研究综述（2026-05-09）

2026-05-09

Agent 最新研究综述（2026-05-09）

本报告自动生成自 papers.cool/arxiv/cs.AI

筛选标准：AI Agent 系统相关论文

生成时间：2026/5/9 17:30:06

📊 今日概况

总论文数: 25 篇
Agent 相关: 18 篇

方向分布

方向	论文数
other	9
evaluation	3
planning	2
multi_agent	1
safety	3
memory	1

1️⃣ 今日 Agent 相关论文列表

OTHER (9 篇)

1. AI Co-Mathematician: Accelerating Mathematicians with Agentic AI

arXiv ID: 2605.06651
研究方向: other
核心要点:
- mathematician,mathematicians,agentic,mathematical,workflows,workbench,ideation,frontiermath,accelerating,stateful

2. SkillOS: Learning Skill Curation for Self-Evolving Agents

arXiv ID: 2605.06614
研究方向: other
核心要点:
- skill,curation,skillos,curator,skillrepo,skills,executor,agents,tasks,evolving

3. NeuroAgent: LLM Agents for Multimodal Neuroimaging Analysis and Research

arXiv ID: 2605.06584
研究方向: other
核心要点:
- neuroagent,neuroimaging,preprocessing,analysis,smri,llm,multimodal,470,fmri,pet

4. Process Matters more than Output for Distinguishing Humans from Machines

arXiv ID: 2605.06524
研究方向: other
核心要点:
- process,human,cognitive,task,humans,machines,agents,fine,tuning,mimicry

5. Instrumental Choices: Measuring the Propensity of LLM Agents to Pursue Instrumental Behaviors

arXiv ID: 2605.06490
研究方向: other
核心要点:
- behaviour,instrumental,agents,propensity,680,dangerous,stakes,choices,roleplay,measuring

6. ReasonSTL: Bridging Natural Language and Signal Temporal Logic via Tool-Augmented Process-Rewarded Learning

arXiv ID: 2605.06483
研究方向: other
核心要点:
- reasonstl,stl,textsc,language,rewarded,natural,temporal,tool,logic,requirements

7. Beyond Task Success: Measuring Workflow Fidelity in LLM-Based Agentic Payment Systems

arXiv ID: 2605.06457
研究方向: other
核心要点:
- tsr,hf1,payment,agentic,asr,agent,checkpoint,success,workflow,llm

8. PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors

arXiv ID: 2605.06455
研究方向: other
核心要点:
- prefixguard,prefix,webarena,auprc,monitor,terminalbench,warning,monitors,llm,bench

9. From Agent Loops to Deterministic Graphs: Execution Lineage for Reproducible AI-Native Work

arXiv ID: 2605.06365
研究方向: other
核心要点:
- memo,lineage,execution,artifact,replay,dag,update,unrelated,final,producing

EVALUATION (3 篇)

1. GlazyBench: A Benchmark for Ceramic Glaze Property Prediction and Image Generation

arXiv ID: 2605.06641
研究方向: evaluation
核心要点:
- glaze,glazybench,ceramic,glazes,multimodal,assisted,benchmark,property,prediction,generation

2. SpatialEpiBench: Benchmarking Spatial Information and Epidemic Priors in Forecasting

arXiv ID: 2605.06530
研究方向: evaluation
核心要点:
- epidemic,spatialepibench,forecasting,outbreak,priors,spatial,spatiotemporal,standardized,adjacency,rachel

arXiv ID: 2605.06444
研究方向: planning, evaluation
核心要点:
- rubric,scrub,social,expert,reasoning,evaluation,judges,judgments,concept,experts

PLANNING (2 篇)

1. Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

arXiv ID: 2605.06638
研究方向: planning
核心要点:
- reasoning,expressiveness,expressive,horizon,teach,training,difficulty,logical,scalelogic,depth

arXiv ID: 2605.06444
研究方向: planning, evaluation
核心要点:
- rubric,scrub,social,expert,reasoning,evaluation,judges,judgments,concept,experts

MULTI_AGENT (1 篇)

1. MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems

arXiv ID: 2605.06623
研究方向: multi_agent
核心要点:
- maspo,prompts,prompt,agent,llm,agents,wangzx1219,joint,across,optimization

SAFETY (3 篇)

1. Improved techniques for fine-tuning flow models via adjoint matching: a deterministic control pipeline

arXiv ID: 2605.06583
研究方向: safety
核心要点:
- adjoint,alignment,control,preservation,matching,deterministic,improved,flow,regress,sit

2. Market-Alignment Risk in Pricing Agents: Trace Diagnostics and Trace-Prior RL under Hidden Competitor State

arXiv ID: 2605.06529
研究方向: safety
核心要点:
- hotel,revpar,pricing,competitor,trace,market,revenue,adr,prior,failure

3. Automated alignment is harder than you think

arXiv ID: 2605.06390
研究方向: safety
核心要点:
- alignment,research,agents,human,outputs,automated,supervise,mistakes,likely,assessments

MEMORY (1 篇)

1. Knowledge Graphs, the Missing Link in Agentic AI-based Formal Verification

arXiv ID: 2605.06434
研究方向: memory
核心要点:
- rtl,svas,syntax,formal,specification,verification,agentic,coverage,specifications,grounding

2️⃣ 研究趋势分析

今日热点方向

根据今日 18 篇相关论文分析：

other 方向: 9 篇论文 🔥 热点
evaluation 方向: 3 篇论文 📈 增长
safety 方向: 3 篇论文 📈 增长

技术范式变化

Tool Calling → Tool Learning: 从简单工具调用到自主工具学习

新兴架构模式

Agent Workflow: 工作流编排架构

3️⃣ 关键洞察

Memory 正在成为基础设施: 越来越多的系统将记忆能力视为标配，而非可选特性
Planning 从规则转向学习: 传统符号规划正在被神经网络学习取代
Multi-Agent 协作标准化: 多智能体通信协议和协调机制正在形成共识
Safety 从后置到前置: 安全性设计正在融入系统架构，而非事后补救
评估基准快速演进: Agent 能力评估正在从单一任务向复杂场景扩展
开源方案快速迭代: 商业 Agent 能力正在被开源实现快速追赶

4️⃣ 技术演进路径

Prompt Engineering
       ↓
   LLM Agent
       ↓
  Tool-Augmented Agent
       ↓
   Memory System
       ↓
  Multi-Agent System
       ↓
  Autonomous Agent

当前热点路径

RAG → Memory System → World Model: 记忆架构持续深化
ReAct → Planning System → Goal Reasoning: 推理能力增强

5️⃣ 与开源 Agent 项目的关联

主流项目对照

开源项目	相关方向	今日论文验证
LangChain	tool, planning	✅
LlamaIndex	memory, rag	✅
AutoGPT	planning, autonomous	✅
CrewAI	multi-agent	✅
Mem0	memory	✅
OpenDevin	tool, planning	➖

设计验证与演进

被验证的设计:

Memory System 的必要性得到持续验证
Tool Use 作为 Agent 核心能力已成共识
Multi-Agent 架构在复杂任务中表现优越

需要演进的设计:

简单的 RAG 正在被 Memory System 取代
单体 Agent 架构在复杂场景中受限
静态 Tool Definition 需要向动态学习演进

6️⃣ 架构级结论

Memory First: 新 Agent 项目应优先设计 Memory System，而非事后添加
Tool Abstraction: 工具抽象层应支持动态发现和学习，而非硬编码
Multi-Agent Ready: 即使当前是单 Agent，架构应预留多 Agent 扩展能力
Safety by Design: 安全机制应在架构设计阶段考虑，而非事后补救
Evaluation Driven: 建立持续评估机制，而非依赖人工测试

7️⃣ 下一步行动建议

Memory Schema 设计

采用分层记忆架构: Working Memory → Episodic → Long-term
设计统一的 Memory Interface，支持多种后端（向量、图、关系型）
实现 Memory Compression 机制，避免无限增长

Retrieval Policy 升级

从简单相似度检索升级为混合检索（关键词 + 向量 + 知识图谱）
实现上下文感知的动态检索策略
考虑引入 Reranking 机制提升相关性

Agent Orchestration 调整

设计标准化的 Agent 通信协议
实现动态任务分配机制
考虑引入 Orchestrator 角色

📚 附录

论文完整列表

本报告由 OpenClaw 自动生成
面向 Agent 架构师，提供决策参考

Agent 最新研究综述（2026-05-09）

📊 今日概况

方向分布

1️⃣ 今日 Agent 相关论文列表

OTHER (9 篇)

1. AI Co-Mathematician: Accelerating Mathematicians with Agentic AI

2. SkillOS: Learning Skill Curation for Self-Evolving Agents

3. NeuroAgent: LLM Agents for Multimodal Neuroimaging Analysis and Research

4. Process Matters more than Output for Distinguishing Humans from Machines

5. Instrumental Choices: Measuring the Propensity of LLM Agents to Pursue Instrumental Behaviors

6. ReasonSTL: Bridging Natural Language and Signal Temporal Logic via Tool-Augmented Process-Rewarded Learning

7. Beyond Task Success: Measuring Workflow Fidelity in LLM-Based Agentic Payment Systems

8. PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors

9. From Agent Loops to Deterministic Graphs: Execution Lineage for Reproducible AI-Native Work

EVALUATION (3 篇)

1. GlazyBench: A Benchmark for Ceramic Glaze Property Prediction and Image Generation

2. SpatialEpiBench: Benchmarking Spatial Information and Epidemic Priors in Forecasting

3. SCRuB: Social Concept Reasoning under Rubric-Based Evaluation

PLANNING (2 篇)

1. Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

2. SCRuB: Social Concept Reasoning under Rubric-Based Evaluation

MULTI_AGENT (1 篇)

1. MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems

SAFETY (3 篇)

1. Improved techniques for fine-tuning flow models via adjoint matching: a deterministic control pipeline

2. Market-Alignment Risk in Pricing Agents: Trace Diagnostics and Trace-Prior RL under Hidden Competitor State

3. Automated alignment is harder than you think

MEMORY (1 篇)

1. Knowledge Graphs, the Missing Link in Agentic AI-based Formal Verification

2️⃣ 研究趋势分析

今日热点方向

技术范式变化

新兴架构模式

3️⃣ 关键洞察

4️⃣ 技术演进路径

当前热点路径

5️⃣ 与开源 Agent 项目的关联

主流项目对照

设计验证与演进

6️⃣ 架构级结论

7️⃣ 下一步行动建议

Memory Schema 设计

Retrieval Policy 升级

Agent Orchestration 调整

📚 附录

论文完整列表