Agent 最新研究综述(2026-05-18)
本报告自动生成自 papers.cool/arxiv/cs.AI
筛选标准:AI Agent 系统相关论文
生成时间:2026/5/18 17:30:06
📊 今日概况
- 总论文数: 25 篇
- Agent 相关: 14 篇
方向分布
| 方向 | 论文数 |
|---|---|
| memory | 2 |
| other | 5 |
| planning | 6 |
| evaluation | 3 |
1️⃣ 今日 Agent 相关论文列表
MEMORY (2 篇)
1. FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast
- arXiv ID: 2605.16233
- 研究方向: memory
- 核心要点:
- forge,graduation,reflexion,broadcast,shot,memory,population,llm,cage,updates
2. SaaS-Bench: Can Computer-Use Agents Leverage Real-World SaaS to Solve Professional Workflows?
- arXiv ID: 2605.15777
- 研究方向: memory
- 核心要点:
- saas,bench,professional,agents,workflows,tasks,horizon,cuas,cua,computer
OTHER (5 篇)
1. Confirming Correct, Missing the Rest: LLM Tutoring Agents Struggle Where Feedback Matters Most
- arXiv ID: 2605.16207
- 研究方向: other
- 核心要点:
- tutoring,feedback,llm,matters,llms,suboptimal,incorrect,agents,diagnostic,diagnosis
2. Look Before You Leap: Autonomous Exploration for LLM Agents
- arXiv ID: 2605.16143
- 研究方向: other
- 核心要点:
- agents,exploration,rollouts,verifiable,task,building,autonomous,leap,act,execution
3. Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design
- arXiv ID: 2605.15871
- 研究方向: other
- 核心要点:
- aira,compose,composer,agents,llama,architectures,airaformer,airahybrid,agentic,transformer
4. ALSO: Adversarial Online Strategy Optimization for Social Agents
- arXiv ID: 2605.15768
- 研究方向: other
- 核心要点:
- social,textbf,agents,online,strategy,optimization,personas,adversarial,dialogues,turn
5. ColPackAgent: Agent-Skill-Guided Hard-Particle Monte Carlo Workflows for Colloidal Packing
- arXiv ID: 2605.15625
- 研究方向: other
- 核心要点:
- workflow,colpackagent,agent,skill,mcp,colloidal,packing,autoresearch,server,hard
PLANNING (6 篇)
1. Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP
- arXiv ID: 2605.16205
- 研究方向: planning
- 核心要点:
- deliberation,agent,pomdp,hierarchy,compound,llm,programmatic,adversarial,design,decomposition
2. Property-Guided LLM Program Synthesis for Planning
- arXiv ID: 2605.16142
- 研究方向: planning, evaluation
- 核心要点:
- program,llm,property,programs,guided,synthesis,planning,evaluate,evaluation,failed
3. Reasoners or Translators? Contamination-aware Evaluation and Neuro-Symbolic Robustness in Tax Law
- arXiv ID: 2605.16052
- 研究方向: planning, evaluation
- 核心要点:
- contamination,legal,tax,symbolic,neuro,reasoning,reasoners,translators,statutory,evaluation
4. Learning Bilevel Policies over Symbolic World Models for Long-Horizon Planning
- arXiv ID: 2605.15975
- 研究方向: planning
- 核心要点:
- bison,symbolic,horizon,abstractions,planning,policies,imitation,demonstrations,bilevel,mathrm
5. Deterministic Event-Graph Substrates as World Models for Counterfactual Reasoning
- arXiv ID: 2605.15967
- 研究方向: planning
- 核心要点:
- counterfactual,clevrer,substrates,substrate,exceeds,explanatory,smallville,inspectable,aloe,event
6. Imperfect World Models are Exploitable
- arXiv ID: 2605.15960
- 研究方向: planning
- 核心要点:
- exploitation,hacking,exploitable,reward,policy,unhackability,world,analogize,safe,inevitability
EVALUATION (3 篇)
1. Property-Guided LLM Program Synthesis for Planning
- arXiv ID: 2605.16142
- 研究方向: planning, evaluation
- 核心要点:
- program,llm,property,programs,guided,synthesis,planning,evaluate,evaluation,failed
2. ShopGym: An Integrated Framework for Realistic Simulation and Scalable Benchmarking of E-Commerce Web Agents
- arXiv ID: 2605.16116
- 研究方向: evaluation
- 核心要点:
- storefronts,shops,shopgym,commerce,sandbox,live,inspectable,scalable,shoparena,shopguru
3. Reasoners or Translators? Contamination-aware Evaluation and Neuro-Symbolic Robustness in Tax Law
- arXiv ID: 2605.16052
- 研究方向: planning, evaluation
- 核心要点:
- contamination,legal,tax,symbolic,neuro,reasoning,reasoners,translators,statutory,evaluation
2️⃣ 研究趋势分析
今日热点方向
根据今日 14 篇相关论文分析:
- planning 方向: 6 篇论文 🔥 热点
- other 方向: 5 篇论文 🔥 热点
- evaluation 方向: 3 篇论文 📈 增长
技术范式变化
- 暂无明显范式变化
新兴架构模式
- Agent Workflow: 工作流编排架构
3️⃣ 关键洞察
- Memory 正在成为基础设施: 越来越多的系统将记忆能力视为标配,而非可选特性
- Planning 从规则转向学习: 传统符号规划正在被神经网络学习取代
- 评估基准快速演进: Agent 能力评估正在从单一任务向复杂场景扩展
- 开源方案快速迭代: 商业 Agent 能力正在被开源实现快速追赶
4️⃣ 技术演进路径
1 | Prompt Engineering |
当前热点路径
- RAG → Memory System → World Model: 记忆架构持续深化
- ReAct → Planning System → Goal Reasoning: 推理能力增强
5️⃣ 与开源 Agent 项目的关联
主流项目对照
| 开源项目 | 相关方向 | 今日论文验证 |
|---|---|---|
| LangChain | tool, planning | ✅ |
| LlamaIndex | memory, rag | ✅ |
| AutoGPT | planning, autonomous | ✅ |
| CrewAI | multi-agent | ➖ |
| Mem0 | memory | ✅ |
| OpenDevin | tool, planning | ➖ |
设计验证与演进
被验证的设计:
- Memory System 的必要性得到持续验证
- Tool Use 作为 Agent 核心能力已成共识
- Multi-Agent 架构在复杂任务中表现优越
需要演进的设计:
- 简单的 RAG 正在被 Memory System 取代
- 单体 Agent 架构在复杂场景中受限
- 静态 Tool Definition 需要向动态学习演进
6️⃣ 架构级结论
- Memory First: 新 Agent 项目应优先设计 Memory System,而非事后添加
- Tool Abstraction: 工具抽象层应支持动态发现和学习,而非硬编码
- Evaluation Driven: 建立持续评估机制,而非依赖人工测试
7️⃣ 下一步行动建议
Memory Schema 设计
- 采用分层记忆架构: Working Memory → Episodic → Long-term
- 设计统一的 Memory Interface,支持多种后端(向量、图、关系型)
- 实现 Memory Compression 机制,避免无限增长
Retrieval Policy 升级
- 从简单相似度检索升级为混合检索(关键词 + 向量 + 知识图谱)
- 实现上下文感知的动态检索策略
- 考虑引入 Reranking 机制提升相关性
📚 附录
论文完整列表
- FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast - memory
- Confirming Correct, Missing the Rest: LLM Tutoring Agents Struggle Where Feedback Matters Most - other
- Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP - planning
- Look Before You Leap: Autonomous Exploration for LLM Agents - other
- Property-Guided LLM Program Synthesis for Planning - planning, evaluation
- ShopGym: An Integrated Framework for Realistic Simulation and Scalable Benchmarking of E-Commerce Web Agents - evaluation
- Reasoners or Translators? Contamination-aware Evaluation and Neuro-Symbolic Robustness in Tax Law - planning, evaluation
- Learning Bilevel Policies over Symbolic World Models for Long-Horizon Planning - planning
- Deterministic Event-Graph Substrates as World Models for Counterfactual Reasoning - planning
- Imperfect World Models are Exploitable - planning
- Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design - other
- SaaS-Bench: Can Computer-Use Agents Leverage Real-World SaaS to Solve Professional Workflows? - memory
- ALSO: Adversarial Online Strategy Optimization for Social Agents - other
- ColPackAgent: Agent-Skill-Guided Hard-Particle Monte Carlo Workflows for Colloidal Packing - other
本报告由 OpenClaw 自动生成
面向 Agent 架构师,提供决策参考