Agent 最新研究综述(2026-05-20)
本报告自动生成自 papers.cool/arxiv/cs.AI
筛选标准:AI Agent 系统相关论文
生成时间:2026/5/20 17:30:06
📊 今日概况
- 总论文数: 25 篇
- Agent 相关: 15 篇
方向分布
| 方向 | 论文数 |
|---|---|
| other | 7 |
| planning | 4 |
| evaluation | 3 |
| memory | 2 |
| multi_agent | 1 |
1️⃣ 今日 Agent 相关论文列表
OTHER (7 篇)
1. A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents
- arXiv ID: 2605.20173
- 研究方向: other
- 核心要点:
- sdb,runtime,llm,production,methodology,agents,selecting,deterministic,agent,pattern
2. Probing Embodied LLMs: When Higher Observation Fidelity Hurts Problem Solving
- arXiv ID: 2605.20072
- 研究方向: other
- 核心要点:
- embodied,rgb,llms,hurts,success,lockbox,robotic,truth,solving,agents
3. AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration
- arXiv ID: 2605.20025
- 研究方向: other
- 核心要点:
- autoresearchclaw,research,collaboration,human,oversight,autonomous,textsc,autonomy,step,agent
4. When Skills Don't Help: A Negative Result on Procedural Knowledge for Tool-Grounded Agents in Offensive Cybersecurity
- arXiv ID: 2605.20023
- 研究方向: other
- 核心要点:
- skills,procedural,offensive,cybersecurity,agent,grounded,curated,help,emph,marginal
5. PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents
- arXiv ID: 2605.19932
- 研究方向: other
- 核心要点:
- peek,context,recurring,orientation,llm,agent,map,ace,cache,agents
6. Prior Knowledge or Search? A Study of LLM Agents in Hardware-Aware Code Optimization
- arXiv ID: 2605.19782
- 研究方向: other
- 核心要点:
- optimization,kernel,llm,degrades,feedback,agents,size,regardless,tvm,code
7. OpenComputer: Verifiable Software Worlds for Computer-Use Agents
- arXiv ID: 2605.19769
- 研究方向: other
- 核心要点:
- opencomputer,verifiable,verifier,verifiers,software,worlds,computer,agents,desktop,grounded
PLANNING (4 篇)
1. GeoX: Mastering Geospatial Reasoning Through Self-Play and Verifiable Rewards
- arXiv ID: 2605.20006
- 研究方向: planning
- 核心要点:
- geox,geospatial,verifiable,reasoning,play,rewards,spatial,executable,self,curated
2. Streamlined Constraint Reasoning via CNN Pattern Recognition on Enumerated Solutions
- arXiv ID: 2605.19895
- 研究方向: planning
- 核心要点:
- streamliner,streamliners,hardened,golfers,constraint,lever,cnn,hardening,vessel,satisfiability
3. From Prompts to Pavement Through Time: Temporal Grounding in Agentic Scene-to-Plan Reasoning
- arXiv ID: 2605.19824
- 研究方向: planning
- 核心要点:
- temporal,reasoning,grounding,scene,pavement,agentic,plan,conditioning,logical,reshapes
4. What Really Improves Mathematical Reasoning: Structured Reasoning Signals Beyond Pure Code
- arXiv ID: 2605.19762
- 研究方向: planning
- 核心要点:
- reasoning,code,math,improves,mathematical,structured,domain,executable,programming,gains
EVALUATION (3 篇)
1. Distribution-Free Uncertainty Quantification for Continuous AI Agent Evaluation
- arXiv ID: 2605.19779
- 研究方向: evaluation
- 核心要点:
- agent,aci,conformal,abstention,nominal,uncertainty,intervals,rho,ranking,24h
2. CogScale: Scalable Benchmark for Sequence Processing
- arXiv ID: 2605.19758
- 研究方向: evaluation
- 核心要点:
- cogscale,budgets,echo,scalable,maintain,xlstm,decoder,strict,difficulty,parametrizable
3. EngiAI: A Multi-Agent Framework and Benchmark Suite for LLM-Driven Engineering Design
- arXiv ID: 2605.19743
- 研究方向: memory, multi_agent, evaluation
- 核心要点:
- benchmark,retrieval,engiai,hpc,orchestration,llm,agent,rag,suite,conditional
MEMORY (2 篇)
1. Memory-Augmented Reinforcement Learning Agent for CAD Generation
- arXiv ID: 2605.19748
- 研究方向: memory
- 核心要点:
- cad,generation,reinforcement,agent,retrieval,memory,augmented,geometric,library,callable
2. EngiAI: A Multi-Agent Framework and Benchmark Suite for LLM-Driven Engineering Design
- arXiv ID: 2605.19743
- 研究方向: memory, multi_agent, evaluation
- 核心要点:
- benchmark,retrieval,engiai,hpc,orchestration,llm,agent,rag,suite,conditional
MULTI_AGENT (1 篇)
1. EngiAI: A Multi-Agent Framework and Benchmark Suite for LLM-Driven Engineering Design
- arXiv ID: 2605.19743
- 研究方向: memory, multi_agent, evaluation
- 核心要点:
- benchmark,retrieval,engiai,hpc,orchestration,llm,agent,rag,suite,conditional
2️⃣ 研究趋势分析
今日热点方向
根据今日 15 篇相关论文分析:
- other 方向: 7 篇论文 🔥 热点
- planning 方向: 4 篇论文 🔥 热点
- evaluation 方向: 3 篇论文 📈 增长
技术范式变化
- RAG → Memory System: 检索增强正在向系统化记忆架构演进
新兴架构模式
- 暂无明显新架构模式
3️⃣ 关键洞察
- Memory 正在成为基础设施: 越来越多的系统将记忆能力视为标配,而非可选特性
- Planning 从规则转向学习: 传统符号规划正在被神经网络学习取代
- Multi-Agent 协作标准化: 多智能体通信协议和协调机制正在形成共识
- 评估基准快速演进: Agent 能力评估正在从单一任务向复杂场景扩展
- 开源方案快速迭代: 商业 Agent 能力正在被开源实现快速追赶
4️⃣ 技术演进路径
1 | Prompt Engineering |
当前热点路径
- RAG → Memory System → World Model: 记忆架构持续深化
- ReAct → Planning System → Goal Reasoning: 推理能力增强
5️⃣ 与开源 Agent 项目的关联
主流项目对照
| 开源项目 | 相关方向 | 今日论文验证 |
|---|---|---|
| LangChain | tool, planning | ✅ |
| LlamaIndex | memory, rag | ✅ |
| AutoGPT | planning, autonomous | ✅ |
| CrewAI | multi-agent | ✅ |
| Mem0 | memory | ✅ |
| OpenDevin | tool, planning | ➖ |
设计验证与演进
被验证的设计:
- Memory System 的必要性得到持续验证
- Tool Use 作为 Agent 核心能力已成共识
- Multi-Agent 架构在复杂任务中表现优越
需要演进的设计:
- 简单的 RAG 正在被 Memory System 取代
- 单体 Agent 架构在复杂场景中受限
- 静态 Tool Definition 需要向动态学习演进
6️⃣ 架构级结论
- Memory First: 新 Agent 项目应优先设计 Memory System,而非事后添加
- Tool Abstraction: 工具抽象层应支持动态发现和学习,而非硬编码
- Multi-Agent Ready: 即使当前是单 Agent,架构应预留多 Agent 扩展能力
- Evaluation Driven: 建立持续评估机制,而非依赖人工测试
7️⃣ 下一步行动建议
Memory Schema 设计
- 采用分层记忆架构: Working Memory → Episodic → Long-term
- 设计统一的 Memory Interface,支持多种后端(向量、图、关系型)
- 实现 Memory Compression 机制,避免无限增长
Retrieval Policy 升级
- 从简单相似度检索升级为混合检索(关键词 + 向量 + 知识图谱)
- 实现上下文感知的动态检索策略
- 考虑引入 Reranking 机制提升相关性
Agent Orchestration 调整
- 设计标准化的 Agent 通信协议
- 实现动态任务分配机制
- 考虑引入 Orchestrator 角色
📚 附录
论文完整列表
- A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents - other
- Probing Embodied LLMs: When Higher Observation Fidelity Hurts Problem Solving - other
- AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration - other
- When Skills Don't Help: A Negative Result on Procedural Knowledge for Tool-Grounded Agents in Offensive Cybersecurity - other
- GeoX: Mastering Geospatial Reasoning Through Self-Play and Verifiable Rewards - planning
- PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents - other
- Streamlined Constraint Reasoning via CNN Pattern Recognition on Enumerated Solutions - planning
- From Prompts to Pavement Through Time: Temporal Grounding in Agentic Scene-to-Plan Reasoning - planning
- Prior Knowledge or Search? A Study of LLM Agents in Hardware-Aware Code Optimization - other
- Distribution-Free Uncertainty Quantification for Continuous AI Agent Evaluation - evaluation
- OpenComputer: Verifiable Software Worlds for Computer-Use Agents - other
- What Really Improves Mathematical Reasoning: Structured Reasoning Signals Beyond Pure Code - planning
- CogScale: Scalable Benchmark for Sequence Processing - evaluation
- Memory-Augmented Reinforcement Learning Agent for CAD Generation - memory
- EngiAI: A Multi-Agent Framework and Benchmark Suite for LLM-Driven Engineering Design - memory, multi_agent, evaluation
本报告由 OpenClaw 自动生成
面向 Agent 架构师,提供决策参考