Open Questions¶
Unresolved questions and research leads. Format:
- [ ] {question} *(raised by: source/article, date)*
AI Tool Selection¶
- How rapidly are frontier model rankings changing — is the GPT-5.2 / Claude Opus 4.6 / Gemini 3 Pro comparison still accurate? (raised by: guides/ai-tool-selection, 2026-04-07)
- What is "Claude Cowork" precisely — a separate product, a Claude mode, or a desktop app? When was it launched? (raised by: guides/ai-tool-selection, 2026-04-07)
- Does Mollick's "Harnesses" concept map to the same construct as harness engineering in the OpenAI Codex team's usage? (answered: yes — the agent harness concept was formalized across Anthropic, OpenAI, and LangChain in early 2026; see concepts/agent-harness, 2026-04-16)
- How does NotebookLM's knowledge base quality compare to the LLM knowledge base pattern for research synthesis? (raised by: guides/ai-tool-selection, 2026-04-07)
AI Tools & Adoption¶
- What database technology and connectors were used to aggregate the multi-app data pipeline (Monday.com, GoHighLevel, Zoom, Gmail, Slack, Google Calendar)? (raised by: concepts/ai-for-small-business, 2026-04-03)
- What specific compliance or privacy frameworks apply to deciding what business data is safe to share with cloud AI? (raised by: concepts/ai-for-small-business, 2026-04-03)
- When will on-premise/local LLM deployments become viable for small businesses with data privacy requirements? (raised by: concepts/ai-for-small-business, 2026-04-03)
- How does the "dashboard" quality-control model evolve as model accuracy approaches 100%, and what does human work look like at that point? (raised by: concepts/ai-for-small-business, 2026-04-03)
Claude Code¶
- What is the precise interface/workflow for Claude Code vs. standard Claude chat — separate product, mode, or tool? (answered by: concepts/claude-code-skills — Claude Code is a separate CLI/app with its own skills/commands system, distinct from the chat interface, 2026-04-07)
- How does Claude Code compare to other AI coding tools (Cursor, GitHub Copilot, Devin) for non-programmer use cases? (raised by: concepts/claude-code, 2026-04-03)
Claude Code Skills¶
- What is the actual content of the "Claude Agent Skills Explained" YouTube video — does it cover material beyond the official documentation? (answered: yes — adds 30–50 tokens/skill startup cost, explicit portability across Code/API/claude.ai, four-layer mental model, and multi-subagent skill sharing, 2026-04-07)
- How widely have other AI tools (ChatGPT, Cursor, Gemini CLI) adopted the agentskills.io open standard? (raised by: concepts/claude-code-skills, 2026-04-07)
- What is the practical limit on number of skills before description budget truncation becomes a problem in typical sessions? (raised by: concepts/claude-code-skills, 2026-04-07)
- How do skills interact with CLAUDE.md — do skills override, extend, or co-exist with CLAUDE.md instructions? (answered: they co-exist; CLAUDE.md is the project foundation, skills are portable expertise layered on top — source: Claude_Agent_Skills_Explained transcript, 2026-04-07)
AI Inflection Point & Dark Factory¶
- What does the quality gap look like between "most of the time works" and "all of the time works" — what's still missing? (raised by: concepts/ai-inflection-point, 2026-04-03)
- How is the dark factory pattern adopted outside security-adjacent companies where testing is easier to simulate? (answered by: concepts/harness-engineering — OpenAI Codex team ran a general-purpose product with the same approach, 2026-04-03)
- Will the prediction of 50% of engineers writing 95% AI code by end of 2026 materialize, and how do we measure it? (raised by: concepts/ai-inflection-point, 2026-04-03)
- How does the inflection point expand to non-code knowledge work (law, medicine, journalism)? (raised by: concepts/ai-inflection-point, 2026-04-03)
Local Agent Stack¶
- How does local model capability compare to frontier models for agentic tasks requiring complex multi-step reasoning? (raised by: guides/local-agent-stack, 2026-04-03)
- What is the practical throughput ceiling for local agents — how many agent round-trips per hour on typical developer hardware? (raised by: guides/local-agent-stack, 2026-04-03)
- How does LangGraph handle state persistence across sessions for long-running local agents? (raised by: guides/local-agent-stack, 2026-04-03)
LLM Knowledge Bases & PKM¶
- At what scale (articles, words) does the index-based approach break down and require RAG or vector search? (raised by: concepts/llm-knowledge-base, 2026-04-03)
- What does synthetic data generation + finetuning look like in practice for personal knowledge bases — what tools, what workflow? (raised by: concepts/llm-knowledge-base, 2026-04-03)
- What product could formalize the "hacky collection of scripts" LLM knowledge base into a polished tool? (raised by: concepts/llm-knowledge-base, 2026-04-03)
- How does the ephemeral wiki pattern (spawn, lint, report, discard) differ from persistent wiki maintenance — when is each appropriate? (raised by: concepts/llm-knowledge-base, 2026-04-03)
- qmd: is it actively maintained, what are its exact system requirements (Node version, GGUF model size), and at what article count does it become worth the setup cost over a custom script? (raised by: concepts/llm-knowledge-base, 2026-04-08)
- How do you define "actionable" when deciding which notes to turn into AI skills vs. leave as passive reference? (raised by: concepts/personal-knowledge-management, 2026-04-03)
- What happens when AI skills built from different sources conflict — how do you reconcile competing frameworks? (raised by: concepts/personal-knowledge-management, 2026-04-03)
- At what knowledge graph size does the system become unwieldy — is there an upper limit to useful PKM scale? (raised by: concepts/personal-knowledge-management, 2026-04-03)
Obsidian + Claude Code OS¶
- What is the exact Obsidian CLI interface — is it an MCP server, a CLI tool, or an Obsidian plugin that exposes an API? (raised by: concepts/obsidian-claude-code-os, 2026-04-03) — Note: now that MCP is in the wiki (concepts/mcp-authentication), the MCP server hypothesis is worth investigating first
- How do reference files stay up-to-date — is maintenance a manual discipline or can agents update them? (raised by: concepts/obsidian-claude-code-os, 2026-04-03)
- At what vault size does the relationship graph become too large for Claude Code to traverse effectively? (raised by: concepts/obsidian-claude-code-os, 2026-04-03)
- How do you handle multi-person vaults or shared team vaults — does the pattern transfer to teams? (raised by: concepts/obsidian-claude-code-os, 2026-04-03)
Agent Harness (General Concept)¶
- How does the co-evolution principle affect portability — can a harness designed for Claude work well with GPT or Gemini, or is tight model-harness coupling the norm? (raised by: concepts/agent-harness, 2026-04-16)
- What is the empirical cost/benefit of multi-agent vs. single-agent for real production systems — where does the ~10 tool threshold come from? (raised by: concepts/agent-harness, 2026-04-16)
- How does ACON's 26–54% token reduction technique work in detail, and is it implementable in existing frameworks? (raised by: concepts/agent-harness, 2026-04-16)
- What does the "Ralph Loop" look like in non-coding domains — can the Initializer/Coding Agent pattern generalize to research, writing, or data analysis? (raised by: concepts/agent-harness, 2026-04-16)
Harness Engineering (OpenAI Codex)¶
- What is the full set of "golden principles" the OpenAI Codex team uses — can they be generalized to other codebases? (raised by: concepts/harness-engineering, 2026-04-03)
- How does architectural coherence evolve over years in a fully agent-generated system — does drift compound despite garbage collection? (raised by: concepts/harness-engineering, 2026-04-03)
- What is the Aardvark agent (mentioned in the OpenAI article) and what role does it play in agent-first development? (raised by: concepts/harness-engineering, 2026-04-03)
- At what team or codebase size does the harness engineering model break down — does it scale to 50 or 500 engineers? (raised by: concepts/harness-engineering, 2026-04-03)
Agentic Engineering¶
- What is the best way for mid-career engineers to develop agent-direction skill quickly? (raised by: concepts/agentic-engineering, 2026-04-03)
- How do agentic engineering patterns transfer to non-code knowledge work (legal, medical, editorial)? (raised by: concepts/agentic-engineering, 2026-04-03)
- What does "code review" look like at scale when the reviewer cannot read every line? What signals substitute for line-by-line review? (raised by: concepts/agentic-engineering, 2026-04-03)
MCP Authentication¶
- How does MCP authentication compare to tool-calling auth in other protocols (e.g., OpenAPI/tool-use with API keys)? (raised by: concepts/mcp-authentication, 2026-04-04)
- Is Entra's pre-registration-only limitation permanent, or is DCR support planned? (raised by: concepts/mcp-authentication, 2026-04-04)
- How are token refresh cycles handled in long-running MCP sessions — does FastMCP manage this automatically? (raised by: concepts/mcp-authentication, 2026-04-04)
- What happens when the OBO exchange fails mid-session (token expiry, revoked consent) — how does the server signal errors to the MCP client? (raised by: concepts/mcp-authentication, 2026-04-04)
- Are there performance implications of per-request JWT verification at scale, and what caching strategies help? (raised by: concepts/mcp-authentication, 2026-04-04)
Agentic Workflows¶
- When does an evaluator-optimizer loop converge vs. oscillate? What stopping criteria work in practice? (raised by: concepts/agentic-workflows, 2026-04-09)
- Is there a principled way to choose among the five workflow patterns for a new problem, or is it always empirical? (raised by: concepts/agentic-workflows, 2026-04-09)
- How do agentic workflow patterns map to formal CS constructs (DAGs, state machines, recursive descent)? (raised by: concepts/agentic-workflows, 2026-04-09)
AI Red Teaming¶
- How do you probe for dangerous LLM capabilities like persuasion, deception, and autonomous replication? (raised by: concepts/ai-red-teaming, 2026-04-09)
- Can AI red teaming practices be standardized so organizations can clearly communicate methods and findings? (raised by: concepts/ai-red-teaming, 2026-04-09)
- How do red teaming practices adapt to non-Western linguistic and cultural contexts at scale? (raised by: concepts/ai-red-teaming, 2026-04-09)
- At what point does jailbreak cost rise to the level of buffer overflows — and what specific mitigations drive that transition? (raised by: concepts/ai-red-teaming, 2026-04-09)
Web Scraping¶
- How does the four-tier system handle sites requiring login — is there a fifth tier for authenticated scraping? (raised by: guides/progressive-web-scraping, 2026-04-09)
- How does Bright Data's pricing compare to alternatives (Apify, ScrapingBee, Oxylabs) at scale? (raised by: guides/progressive-web-scraping, 2026-04-09)
- Is the PAI skill library actively maintained, and how does skill quality compare to official sources? (raised by: guides/progressive-web-scraping, 2026-04-09)
Frontier AI & Cyber Security¶
- When will a frontier AI model complete the full 32-step enterprise attack scenario end-to-end, and what capability milestone drives it? (raised by: concepts/frontier-ai-cyber-capabilities, 2026-04-15)
- How does AI-enabled attack cost (~£65/attempt) compare to traditional skilled-attacker costs — at what cost threshold does AI fundamentally change the economics of targeted attacks? (raised by: concepts/frontier-ai-cyber-capabilities, 2026-04-15)
- What monitoring and detection tooling is most effective against AI-generated attack activity, given that current models generate noticeable alerts? (raised by: concepts/frontier-ai-cyber-capabilities, 2026-04-15)
- As AI attack behavior becomes stealthier, how does the detectability advantage erode — and how quickly? (raised by: concepts/frontier-ai-cyber-capabilities, 2026-04-15)
- How do the AISI evaluation results translate to real-world attacker operations — what is the gap between simulated scenario and live network exploitation? (raised by: concepts/frontier-ai-cyber-capabilities, 2026-04-15)
- Does ICS attack performance track the enterprise network performance with a lag, and what does AI parity look like for critical infrastructure? (raised by: concepts/frontier-ai-cyber-capabilities, 2026-04-15)
LLM-Tier Security & Personal Computer Security¶
- When will Mythos-tier exploit capability be available to malicious actors at low cost (~$1K), and what is the current price/capability trajectory? (raised by: concepts/llm-tier-security, 2026-04-21)
- How will phone companies defend against increasingly AI-automated SIM-swap attacks? (raised by: concepts/llm-tier-security, 2026-04-21)
- What are best-practice sandboxing setups for developer environments using npm/pip/cargo dependencies (container, VM, Flatpak, bubblewrap)? (raised by: concepts/llm-tier-security, 2026-04-21)
- Will ACATS fraud protections standardize across major brokerages? Which currently offer account lock features? (raised by: concepts/llm-tier-security, 2026-04-21)
- How does an outbound application firewall (OpenSnitch) interact with legitimate AI agents running locally on the same machine? (raised by: concepts/llm-tier-security, 2026-04-21)
Prompt Injection & Security¶
- Is prompt injection fundamentally unsolvable, or is there a theoretical architecture that makes it provably safe? (raised by: concepts/prompt-injection, 2026-04-03)
- When will the "Challenger disaster" of AI (a major high-profile prompt injection incident) occur, and what will it look like? (raised by: concepts/prompt-injection, 2026-04-03)
- Has anyone deployed a production system using the CaMeL privileged/quarantined agent architecture? (raised by: concepts/prompt-injection, 2026-04-03)
- How do prompt injection risks change as agents gain access to physical systems (robots, vehicles, IoT)? (raised by: concepts/prompt-injection, 2026-04-03)
- Were there confirmed exploits of CVE-2026-33579 in the wild before the patch, given the two-day window between patch release and CVE publication? (raised by: concepts/openclaw-security, 2026-04-03)
- Is the CaMeL privileged/quarantined agent architecture a viable path to making tools like OpenClaw safe by design? (raised by: concepts/openclaw-security, 2026-04-03)
AI Regulation & Policy¶
- What specific provisions of SB24-205 triggered the DOJ's constitutional challenge — was it the Commerce Clause, Supremacy Clause preemption, or another doctrine? (raised by: concepts/ai-regulation, 2026-04-25)
- Will the DOJ's action against Colorado deter other states from passing similar AI regulation bills, or will states continue to experiment? (raised by: concepts/ai-regulation, 2026-04-25)
- Is there a federal AI law in progress that would provide clear preemption, or is the DOJ relying solely on existing constitutional doctrine? (raised by: concepts/ai-regulation, 2026-04-25)
- How does the DOJ's state-preemption posture interact with the EU AI Act — does it increase pressure for federal AI legislation? (raised by: concepts/ai-regulation, 2026-04-25)
- What happens to Colorado's existing algorithmic discrimination law — is it also vulnerable to DOJ challenge? (raised by: concepts/ai-regulation, 2026-04-25)
- Does the DOJ's posture reflect a broader deregulatory stance toward AI, and is it consistent across administrations? (raised by: entities/harmeet-dhillon, 2026-04-25)
Multi-Agent Misalignment¶
- Is there a principled way to choose between sub-agents vs. agent teams for a given problem, beyond the independence/dependence heuristic? (raised by: raw/articles/suryanshti-sub-agents-vs-agent-teams-2026.md, 2026-04-25)
- How does the Claude Agent SDK's
description-based routing compare to explicit orchestrator logic — which is more reliable in practice? (raised by: raw/articles/suryanshti-sub-agents-vs-agent-teams-2026.md, 2026-04-25) - Can context-based decomposition be formalized into a decision framework (e.g., shared state graph analysis), or is it always a judgment call? (raised by: raw/articles/suryanshti-sub-agents-vs-agent-teams-2026.md, 2026-04-25)
- Is the state keeper agent pattern sufficient for larger organizations (10+, 50+ agents) or does it become a bottleneck? (raised by: concepts/multi-agent-misalignment, 2026-04-25)
- Does narrative drift worsen with more agents, more information-passing rounds, or more tightly-scoped roles — and is there an empirical curve? (raised by: concepts/multi-agent-misalignment, 2026-04-25)
- Are there multi-agent architectures that are structurally immune to narrative drift (e.g., shared memory, consensus protocols)? (raised by: concepts/multi-agent-misalignment, 2026-04-25)
- How does narrative drift interact with adversarial prompting — could a bad actor exploit role fidelity to inject false institutional records? (raised by: concepts/multi-agent-misalignment, 2026-04-25)
- Can the MAST vocabulary (inter-agent misalignment, reasoning-action mismatch, incomplete verification) be operationalized into automated detection? (raised by: concepts/multi-agent-misalignment, 2026-04-25)
- Does the narrative drift problem scale differently with different organizational topologies (hierarchical vs. flat, synchronous vs. async)? (raised by: concepts/multi-agent-misalignment, 2026-04-25)
- Is Vei open-source or publicly accessible, and what stack is it built on? (raised by: entities/vei, 2026-04-25)
Context Files (CLAUDE.md / AGENTS.md / DESIGN.md / SKILL.md)¶
- Is there coordination between Anthropic, OpenAI, and Google Stitch to formalize a common context-file standard, or is convergence happening organically? (raised by: concepts/context-files, 2026-04-25)
- How should projects handle conflicts between CLAUDE.md project context and DESIGN.md design rules — which takes precedence? (raised by: concepts/context-files, 2026-04-25)
- Will DESIGN.md adoption spread beyond Google Stitch to other AI-assisted development tools (Cursor, Copilot, etc.)? (raised by: concepts/context-files, 2026-04-25)
- Is Google Stitch a standalone Google product or part of a larger platform (Firebase, Vertex, Gemini)? (raised by: entities/google-stitch, 2026-04-25)
- How do context files scale in very large repositories — is there a risk of context file bloat? (raised by: concepts/context-files, 2026-04-25)
Claude Code Best Practices¶
- At what task complexity does the "break it down" principle need to be applied — is there a heuristic for when a task is too large for a single session? (raised by: concepts/claude-code, 2026-04-25)
- How do custom slash commands interact with Claude Code skills — can they overlap, and which takes precedence? (raised by: concepts/claude-code, 2026-04-25)
- What is the optimal CLAUDE.md length — at what point does it become too long to read on every session start? (raised by: concepts/claude-code, 2026-04-25)