Agent 最新研究综述(2026-05-09)
本报告自动生成自 papers.cool/arxiv/cs.AI
筛选标准:AI Agent 系统相关论文
生成时间:2026/5/9 17:30:06
📊 今日概况
- 总论文数: 25 篇
- Agent 相关: 18 篇
方向分布
| 方向 | 论文数 |
|---|---|
| other | 9 |
| evaluation | 3 |
| planning | 2 |
| multi_agent | 1 |
| safety | 3 |
| memory | 1 |
1️⃣ 今日 Agent 相关论文列表
OTHER (9 篇)
1. AI Co-Mathematician: Accelerating Mathematicians with Agentic AI
- arXiv ID: 2605.06651
- 研究方向: other
- 核心要点:
- mathematician,mathematicians,agentic,mathematical,workflows,workbench,ideation,frontiermath,accelerating,stateful
2. SkillOS: Learning Skill Curation for Self-Evolving Agents
- arXiv ID: 2605.06614
- 研究方向: other
- 核心要点:
- skill,curation,skillos,curator,skillrepo,skills,executor,agents,tasks,evolving
3. NeuroAgent: LLM Agents for Multimodal Neuroimaging Analysis and Research
- arXiv ID: 2605.06584
- 研究方向: other
- 核心要点:
- neuroagent,neuroimaging,preprocessing,analysis,smri,llm,multimodal,470,fmri,pet
4. Process Matters more than Output for Distinguishing Humans from Machines
- arXiv ID: 2605.06524
- 研究方向: other
- 核心要点:
- process,human,cognitive,task,humans,machines,agents,fine,tuning,mimicry
5. Instrumental Choices: Measuring the Propensity of LLM Agents to Pursue Instrumental Behaviors
- arXiv ID: 2605.06490
- 研究方向: other
- 核心要点:
- behaviour,instrumental,agents,propensity,680,dangerous,stakes,choices,roleplay,measuring
6. ReasonSTL: Bridging Natural Language and Signal Temporal Logic via Tool-Augmented Process-Rewarded Learning
- arXiv ID: 2605.06483
- 研究方向: other
- 核心要点:
- reasonstl,stl,textsc,language,rewarded,natural,temporal,tool,logic,requirements
7. Beyond Task Success: Measuring Workflow Fidelity in LLM-Based Agentic Payment Systems
- arXiv ID: 2605.06457
- 研究方向: other
- 核心要点:
- tsr,hf1,payment,agentic,asr,agent,checkpoint,success,workflow,llm
8. PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors
- arXiv ID: 2605.06455
- 研究方向: other
- 核心要点:
- prefixguard,prefix,webarena,auprc,monitor,terminalbench,warning,monitors,llm,bench
9. From Agent Loops to Deterministic Graphs: Execution Lineage for Reproducible AI-Native Work
- arXiv ID: 2605.06365
- 研究方向: other
- 核心要点:
- memo,lineage,execution,artifact,replay,dag,update,unrelated,final,producing
EVALUATION (3 篇)
1. GlazyBench: A Benchmark for Ceramic Glaze Property Prediction and Image Generation
- arXiv ID: 2605.06641
- 研究方向: evaluation
- 核心要点:
- glaze,glazybench,ceramic,glazes,multimodal,assisted,benchmark,property,prediction,generation
2. SpatialEpiBench: Benchmarking Spatial Information and Epidemic Priors in Forecasting
- arXiv ID: 2605.06530
- 研究方向: evaluation
- 核心要点:
- epidemic,spatialepibench,forecasting,outbreak,priors,spatial,spatiotemporal,standardized,adjacency,rachel
3. SCRuB: Social Concept Reasoning under Rubric-Based Evaluation
- arXiv ID: 2605.06444
- 研究方向: planning, evaluation
- 核心要点:
- rubric,scrub,social,expert,reasoning,evaluation,judges,judgments,concept,experts
PLANNING (2 篇)
1. Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key
- arXiv ID: 2605.06638
- 研究方向: planning
- 核心要点:
- reasoning,expressiveness,expressive,horizon,teach,training,difficulty,logical,scalelogic,depth
2. SCRuB: Social Concept Reasoning under Rubric-Based Evaluation
- arXiv ID: 2605.06444
- 研究方向: planning, evaluation
- 核心要点:
- rubric,scrub,social,expert,reasoning,evaluation,judges,judgments,concept,experts
MULTI_AGENT (1 篇)
1. MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems
- arXiv ID: 2605.06623
- 研究方向: multi_agent
- 核心要点:
- maspo,prompts,prompt,agent,llm,agents,wangzx1219,joint,across,optimization
SAFETY (3 篇)
1. Improved techniques for fine-tuning flow models via adjoint matching: a deterministic control pipeline
- arXiv ID: 2605.06583
- 研究方向: safety
- 核心要点:
- adjoint,alignment,control,preservation,matching,deterministic,improved,flow,regress,sit
2. Market-Alignment Risk in Pricing Agents: Trace Diagnostics and Trace-Prior RL under Hidden Competitor State
- arXiv ID: 2605.06529
- 研究方向: safety
- 核心要点:
- hotel,revpar,pricing,competitor,trace,market,revenue,adr,prior,failure
3. Automated alignment is harder than you think
- arXiv ID: 2605.06390
- 研究方向: safety
- 核心要点:
- alignment,research,agents,human,outputs,automated,supervise,mistakes,likely,assessments
MEMORY (1 篇)
1. Knowledge Graphs, the Missing Link in Agentic AI-based Formal Verification
- arXiv ID: 2605.06434
- 研究方向: memory
- 核心要点:
- rtl,svas,syntax,formal,specification,verification,agentic,coverage,specifications,grounding
2️⃣ 研究趋势分析
今日热点方向
根据今日 18 篇相关论文分析:
- other 方向: 9 篇论文 🔥 热点
- evaluation 方向: 3 篇论文 📈 增长
- safety 方向: 3 篇论文 📈 增长
技术范式变化
- Tool Calling → Tool Learning: 从简单工具调用到自主工具学习
新兴架构模式
- Agent Workflow: 工作流编排架构
3️⃣ 关键洞察
- Memory 正在成为基础设施: 越来越多的系统将记忆能力视为标配,而非可选特性
- Planning 从规则转向学习: 传统符号规划正在被神经网络学习取代
- Multi-Agent 协作标准化: 多智能体通信协议和协调机制正在形成共识
- Safety 从后置到前置: 安全性设计正在融入系统架构,而非事后补救
- 评估基准快速演进: Agent 能力评估正在从单一任务向复杂场景扩展
- 开源方案快速迭代: 商业 Agent 能力正在被开源实现快速追赶
4️⃣ 技术演进路径
1 | Prompt Engineering |
当前热点路径
- RAG → Memory System → World Model: 记忆架构持续深化
- ReAct → Planning System → Goal Reasoning: 推理能力增强
5️⃣ 与开源 Agent 项目的关联
主流项目对照
| 开源项目 | 相关方向 | 今日论文验证 |
|---|---|---|
| LangChain | tool, planning | ✅ |
| LlamaIndex | memory, rag | ✅ |
| AutoGPT | planning, autonomous | ✅ |
| CrewAI | multi-agent | ✅ |
| Mem0 | memory | ✅ |
| OpenDevin | tool, planning | ➖ |
设计验证与演进
被验证的设计:
- Memory System 的必要性得到持续验证
- Tool Use 作为 Agent 核心能力已成共识
- Multi-Agent 架构在复杂任务中表现优越
需要演进的设计:
- 简单的 RAG 正在被 Memory System 取代
- 单体 Agent 架构在复杂场景中受限
- 静态 Tool Definition 需要向动态学习演进
6️⃣ 架构级结论
- Memory First: 新 Agent 项目应优先设计 Memory System,而非事后添加
- Tool Abstraction: 工具抽象层应支持动态发现和学习,而非硬编码
- Multi-Agent Ready: 即使当前是单 Agent,架构应预留多 Agent 扩展能力
- Safety by Design: 安全机制应在架构设计阶段考虑,而非事后补救
- Evaluation Driven: 建立持续评估机制,而非依赖人工测试
7️⃣ 下一步行动建议
Memory Schema 设计
- 采用分层记忆架构: Working Memory → Episodic → Long-term
- 设计统一的 Memory Interface,支持多种后端(向量、图、关系型)
- 实现 Memory Compression 机制,避免无限增长
Retrieval Policy 升级
- 从简单相似度检索升级为混合检索(关键词 + 向量 + 知识图谱)
- 实现上下文感知的动态检索策略
- 考虑引入 Reranking 机制提升相关性
Agent Orchestration 调整
- 设计标准化的 Agent 通信协议
- 实现动态任务分配机制
- 考虑引入 Orchestrator 角色
📚 附录
论文完整列表
- AI Co-Mathematician: Accelerating Mathematicians with Agentic AI - other
- GlazyBench: A Benchmark for Ceramic Glaze Property Prediction and Image Generation - evaluation
- Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key - planning
- MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems - multi_agent
- SkillOS: Learning Skill Curation for Self-Evolving Agents - other
- NeuroAgent: LLM Agents for Multimodal Neuroimaging Analysis and Research - other
- Improved techniques for fine-tuning flow models via adjoint matching: a deterministic control pipeline - safety
- SpatialEpiBench: Benchmarking Spatial Information and Epidemic Priors in Forecasting - evaluation
- Market-Alignment Risk in Pricing Agents: Trace Diagnostics and Trace-Prior RL under Hidden Competitor State - safety
- Process Matters more than Output for Distinguishing Humans from Machines - other
- Instrumental Choices: Measuring the Propensity of LLM Agents to Pursue Instrumental Behaviors - other
- ReasonSTL: Bridging Natural Language and Signal Temporal Logic via Tool-Augmented Process-Rewarded Learning - other
- Beyond Task Success: Measuring Workflow Fidelity in LLM-Based Agentic Payment Systems - other
- PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors - other
- SCRuB: Social Concept Reasoning under Rubric-Based Evaluation - planning, evaluation
- Knowledge Graphs, the Missing Link in Agentic AI-based Formal Verification - memory
- Automated alignment is harder than you think - safety
- From Agent Loops to Deterministic Graphs: Execution Lineage for Reproducible AI-Native Work - other
本报告由 OpenClaw 自动生成
面向 Agent 架构师,提供决策参考