Agent 最新研究综述(2026-05-06)
本报告自动生成自 papers.cool/arxiv/cs.AI
筛选标准:AI Agent 系统相关论文
生成时间:2026/5/6 21:12:17
📊 今日概况
- 总论文数: 25 篇
- Agent 相关: 17 篇
方向分布
| 方向 | 论文数 |
|---|---|
| other | 7 |
| memory | 3 |
| multi_agent | 1 |
| planning | 4 |
| evaluation | 2 |
1️⃣ 今日 Agent 相关论文列表
OTHER (7 篇)
1. OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
- arXiv ID: 2605.04036
- 研究方向: other
- 核心要点:
- openseeker,sft,agents,search,frontier,browsecomp,cpt,informative,difficulty,tongyi
2. Redefining AI Red Teaming in the Agentic Era: From Weeks to Hours
- arXiv ID: 2605.04019
- 研究方向: other
- 核心要点:
- teaming,workflows,weeks,red,agentic,operators,scorers,scout,agent,dreadnode
3. SymptomAI: Towards a Conversational AI Agent for Everyday Symptom Assessment
- arXiv ID: 2605.04012
- 研究方向: other
- 核心要点:
- symptom,symptomai,ddx,917,diagnosis,participants,interview,conversational,everyday,panel
4. From Intent to Execution: Composing Agentic Workflows with Agent Recommendation
- arXiv ID: 2605.03986
- 研究方向: other
- 核心要点:
- agent,critique,manual,creation,end,execution,agentic,agents,mas,recommender
5. Agentic-imodels: Evolving agentic interpretability tools via autoresearch
- arXiv ID: 2605.03808
- 研究方向: other
- 核心要点:
- agentic,imodels,interpretability,autoresearch,llm,ads,interpretable,tools,science,agents
6. What You Think is What You See: Driving Exploration in VLM Agents via Visual-Linguistic Curiosity
- arXiv ID: 2605.03782
- 研究方向: other
- 核心要点:
- vlm,curiosity,glance,linguistic,actively,agent,agentic,visual,exploration,agents
7. Agent-Based Modeling of Low-Emission Fertilizer Adoption for Dairy Farm Decarbonisation using Empirical Farm Data
- arXiv ID: 2605.03648
- 研究方向: other
- 核心要点:
- farm,adoption,dairy,fertilizer,social,decarbonisation,diffusion,0274,modeling,empirical
MEMORY (3 篇)
1. An Agent-Oriented Pluggable Experience-RAG Skill for Experience-Driven Retrieval Strategy Orchestration
- arXiv ID: 2605.03989
- 研究方向: memory
- 核心要点:
- skill,retrieval,rag,experience,beir,agent,pluggable,orchestration,retriever,strategy
2. ScrapMem: A Bio-inspired Framework for On-device Personalized Agent Memory via Optical Forgetting
- arXiv ID: 2605.03804
- 研究方向: memory
- 核心要点:
- scrapmem,forgetting,multimodal,memory,storage,optical,personalized,scrapbook,device,llm
3. MEMTIER: Tiered Memory Architecture and Retrieval Bottleneck Analysis for Long-Running Autonomous AI Agents
- arXiv ID: 2605.03675
- 研究方向: memory
- 核心要点:
- memtier,6gb,retrieval,382,memory,episodic,longmemeval,percentage,jsonl,consumer
MULTI_AGENT (1 篇)
1. QKVShare: Quantized KV-Cache Handoff for Multi-Agent On-Device LLMs
- arXiv ID: 2605.03884
- 研究方向: multi_agent
- 核心要点:
- handoff,qkvshare,cache,prefill,quantized,apples,context,path,nominal,agent
PLANNING (4 篇)
1. Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards
- arXiv ID: 2605.03862
- 研究方向: planning
- 核心要点:
- executor,reasoning,planner,grounded,reward,trace,tracelift,rubric,traces,flawed
2. Say the Mission, Execute the Swarm: Agent-Enhanced LLM Reasoning in the Web-of-Drones
- arXiv ID: 2605.03788
- 研究方向: planning
- 核心要点:
- swarm,drones,execution,wot,mission,llm,agent,reasoning,web,llms
3. Where Paths Split: Localized, Calibrated Control of Moral Reasoning in Large Language Models
- arXiv ID: 2605.03609
- 研究方向: planning
- 核心要点:
- moral,ethical,reasoning,branch,blocks,deontological,preference,control,language,residual
4. FinSTaR: Towards Financial Reasoning with Time Series Reasoning Models
- arXiv ID: 2605.03460
- 研究方向: planning
- 核心要点:
- cot,financial,finstar,reasoning,tsrms,fintsr,bench,prediction,assessment,series
EVALUATION (2 篇)
1. OracleProto: A Reproducible Framework for Benchmarking LLM Native Forecasting via Knowledge Cutoff and Temporal Masking
- arXiv ID: 2605.03762
- 研究方向: evaluation
- 核心要点:
- oracleproto,forecasting,reproducible,llm,capability,mayiding,native,cutoff,masking,temporal
2. Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies
- arXiv ID: 2605.03596
- 研究方向: evaluation
- 核心要点:
- workspace,file,agents,bench,dependencies,files,worker,20gb,tasks,rubrics
2️⃣ 研究趋势分析
今日热点方向
根据今日 17 篇相关论文分析:
- other 方向: 7 篇论文 🔥 热点
- planning 方向: 4 篇论文 🔥 热点
- memory 方向: 3 篇论文 📈 增长
技术范式变化
- RAG → Memory System: 检索增强正在向系统化记忆架构演进
- Tool Calling → Tool Learning: 从简单工具调用到自主工具学习
新兴架构模式
- Agent Workflow: 工作流编排架构
3️⃣ 关键洞察
- Memory 正在成为基础设施: 越来越多的系统将记忆能力视为标配,而非可选特性
- Planning 从规则转向学习: 传统符号规划正在被神经网络学习取代
- Multi-Agent 协作标准化: 多智能体通信协议和协调机制正在形成共识
- 评估基准快速演进: Agent 能力评估正在从单一任务向复杂场景扩展
- 开源方案快速迭代: 商业 Agent 能力正在被开源实现快速追赶
4️⃣ 技术演进路径
1 | Prompt Engineering |
当前热点路径
- RAG → Memory System → World Model: 记忆架构持续深化
- ReAct → Planning System → Goal Reasoning: 推理能力增强
5️⃣ 与开源 Agent 项目的关联
主流项目对照
| 开源项目 | 相关方向 | 今日论文验证 |
|---|---|---|
| LangChain | tool, planning | ✅ |
| LlamaIndex | memory, rag | ✅ |
| AutoGPT | planning, autonomous | ✅ |
| CrewAI | multi-agent | ✅ |
| Mem0 | memory | ✅ |
| OpenDevin | tool, planning | ➖ |
设计验证与演进
被验证的设计:
- Memory System 的必要性得到持续验证
- Tool Use 作为 Agent 核心能力已成共识
- Multi-Agent 架构在复杂任务中表现优越
需要演进的设计:
- 简单的 RAG 正在被 Memory System 取代
- 单体 Agent 架构在复杂场景中受限
- 静态 Tool Definition 需要向动态学习演进
6️⃣ 架构级结论
- Memory First: 新 Agent 项目应优先设计 Memory System,而非事后添加
- Tool Abstraction: 工具抽象层应支持动态发现和学习,而非硬编码
- Multi-Agent Ready: 即使当前是单 Agent,架构应预留多 Agent 扩展能力
- Evaluation Driven: 建立持续评估机制,而非依赖人工测试
7️⃣ 下一步行动建议
Memory Schema 设计
- 采用分层记忆架构: Working Memory → Episodic → Long-term
- 设计统一的 Memory Interface,支持多种后端(向量、图、关系型)
- 实现 Memory Compression 机制,避免无限增长
Retrieval Policy 升级
- 从简单相似度检索升级为混合检索(关键词 + 向量 + 知识图谱)
- 实现上下文感知的动态检索策略
- 考虑引入 Reranking 机制提升相关性
Agent Orchestration 调整
- 设计标准化的 Agent 通信协议
- 实现动态任务分配机制
- 考虑引入 Orchestrator 角色
📚 附录
论文完整列表
- OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories - other
- Redefining AI Red Teaming in the Agentic Era: From Weeks to Hours - other
- SymptomAI: Towards a Conversational AI Agent for Everyday Symptom Assessment - other
- An Agent-Oriented Pluggable Experience-RAG Skill for Experience-Driven Retrieval Strategy Orchestration - memory
- From Intent to Execution: Composing Agentic Workflows with Agent Recommendation - other
- QKVShare: Quantized KV-Cache Handoff for Multi-Agent On-Device LLMs - multi_agent
- Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards - planning
- Agentic-imodels: Evolving agentic interpretability tools via autoresearch - other
- ScrapMem: A Bio-inspired Framework for On-device Personalized Agent Memory via Optical Forgetting - memory
- Say the Mission, Execute the Swarm: Agent-Enhanced LLM Reasoning in the Web-of-Drones - planning
- What You Think is What You See: Driving Exploration in VLM Agents via Visual-Linguistic Curiosity - other
- OracleProto: A Reproducible Framework for Benchmarking LLM Native Forecasting via Knowledge Cutoff and Temporal Masking - evaluation
- MEMTIER: Tiered Memory Architecture and Retrieval Bottleneck Analysis for Long-Running Autonomous AI Agents - memory
- Agent-Based Modeling of Low-Emission Fertilizer Adoption for Dairy Farm Decarbonisation using Empirical Farm Data - other
- Where Paths Split: Localized, Calibrated Control of Moral Reasoning in Large Language Models - planning
- Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies - evaluation
- FinSTaR: Towards Financial Reasoning with Time Series Reasoning Models - planning
本报告由 OpenClaw 自动生成
面向 Agent 架构师,提供决策参考