Skip to content

Open Questions

Unresolved questions and research leads. Format: - [ ] {question} *(raised by: source/article, date)*

AI Tool Selection

  • How rapidly are frontier model rankings changing — is the GPT-5.2 / Claude Opus 4.6 / Gemini 3 Pro comparison still accurate? (raised by: guides/ai-tool-selection, 2026-04-07)
  • What is "Claude Cowork" precisely — a separate product, a Claude mode, or a desktop app? When was it launched? (raised by: guides/ai-tool-selection, 2026-04-07)
  • Does Mollick's "Harnesses" concept map to the same construct as harness engineering in the OpenAI Codex team's usage? (answered: yes — the agent harness concept was formalized across Anthropic, OpenAI, and LangChain in early 2026; see concepts/agent-harness, 2026-04-16)
  • How does NotebookLM's knowledge base quality compare to the LLM knowledge base pattern for research synthesis? (raised by: guides/ai-tool-selection, 2026-04-07)

AI Tools & Adoption

  • What database technology and connectors were used to aggregate the multi-app data pipeline (Monday.com, GoHighLevel, Zoom, Gmail, Slack, Google Calendar)? (raised by: concepts/ai-for-small-business, 2026-04-03)
  • What specific compliance or privacy frameworks apply to deciding what business data is safe to share with cloud AI? (raised by: concepts/ai-for-small-business, 2026-04-03)
  • When will on-premise/local LLM deployments become viable for small businesses with data privacy requirements? (raised by: concepts/ai-for-small-business, 2026-04-03)
  • How does the "dashboard" quality-control model evolve as model accuracy approaches 100%, and what does human work look like at that point? (raised by: concepts/ai-for-small-business, 2026-04-03)

Claude Code

  • What is the precise interface/workflow for Claude Code vs. standard Claude chat — separate product, mode, or tool? (answered by: concepts/claude-code-skills — Claude Code is a separate CLI/app with its own skills/commands system, distinct from the chat interface, 2026-04-07)
  • How does Claude Code compare to other AI coding tools (Cursor, GitHub Copilot, Devin) for non-programmer use cases? (raised by: concepts/claude-code, 2026-04-03)

Claude Code Skills

  • What is the actual content of the "Claude Agent Skills Explained" YouTube video — does it cover material beyond the official documentation? (answered: yes — adds 30–50 tokens/skill startup cost, explicit portability across Code/API/claude.ai, four-layer mental model, and multi-subagent skill sharing, 2026-04-07)
  • How widely have other AI tools (ChatGPT, Cursor, Gemini CLI) adopted the agentskills.io open standard? (raised by: concepts/claude-code-skills, 2026-04-07)
  • What is the practical limit on number of skills before description budget truncation becomes a problem in typical sessions? (raised by: concepts/claude-code-skills, 2026-04-07)
  • How do skills interact with CLAUDE.md — do skills override, extend, or co-exist with CLAUDE.md instructions? (answered: they co-exist; CLAUDE.md is the project foundation, skills are portable expertise layered on top — source: Claude_Agent_Skills_Explained transcript, 2026-04-07)

AI Inflection Point & Dark Factory

  • What does the quality gap look like between "most of the time works" and "all of the time works" — what's still missing? (raised by: concepts/ai-inflection-point, 2026-04-03)
  • How is the dark factory pattern adopted outside security-adjacent companies where testing is easier to simulate? (answered by: concepts/harness-engineering — OpenAI Codex team ran a general-purpose product with the same approach, 2026-04-03)
  • Will the prediction of 50% of engineers writing 95% AI code by end of 2026 materialize, and how do we measure it? (raised by: concepts/ai-inflection-point, 2026-04-03)
  • How does the inflection point expand to non-code knowledge work (law, medicine, journalism)? (raised by: concepts/ai-inflection-point, 2026-04-03)

Local Agent Stack

  • How does local model capability compare to frontier models for agentic tasks requiring complex multi-step reasoning? (raised by: guides/local-agent-stack, 2026-04-03)
  • What is the practical throughput ceiling for local agents — how many agent round-trips per hour on typical developer hardware? (raised by: guides/local-agent-stack, 2026-04-03)
  • How does LangGraph handle state persistence across sessions for long-running local agents? (raised by: guides/local-agent-stack, 2026-04-03)

LLM Knowledge Bases & PKM

  • At what scale (articles, words) does the index-based approach break down and require RAG or vector search? (raised by: concepts/llm-knowledge-base, 2026-04-03)
  • What does synthetic data generation + finetuning look like in practice for personal knowledge bases — what tools, what workflow? (raised by: concepts/llm-knowledge-base, 2026-04-03)
  • What product could formalize the "hacky collection of scripts" LLM knowledge base into a polished tool? (raised by: concepts/llm-knowledge-base, 2026-04-03)
  • How does the ephemeral wiki pattern (spawn, lint, report, discard) differ from persistent wiki maintenance — when is each appropriate? (raised by: concepts/llm-knowledge-base, 2026-04-03)
  • qmd: is it actively maintained, what are its exact system requirements (Node version, GGUF model size), and at what article count does it become worth the setup cost over a custom script? (raised by: concepts/llm-knowledge-base, 2026-04-08)
  • How do you define "actionable" when deciding which notes to turn into AI skills vs. leave as passive reference? (raised by: concepts/personal-knowledge-management, 2026-04-03)
  • What happens when AI skills built from different sources conflict — how do you reconcile competing frameworks? (raised by: concepts/personal-knowledge-management, 2026-04-03)
  • At what knowledge graph size does the system become unwieldy — is there an upper limit to useful PKM scale? (raised by: concepts/personal-knowledge-management, 2026-04-03)

Obsidian + Claude Code OS

  • What is the exact Obsidian CLI interface — is it an MCP server, a CLI tool, or an Obsidian plugin that exposes an API? (raised by: concepts/obsidian-claude-code-os, 2026-04-03) — Note: now that MCP is in the wiki (concepts/mcp-authentication), the MCP server hypothesis is worth investigating first
  • How do reference files stay up-to-date — is maintenance a manual discipline or can agents update them? (raised by: concepts/obsidian-claude-code-os, 2026-04-03)
  • At what vault size does the relationship graph become too large for Claude Code to traverse effectively? (raised by: concepts/obsidian-claude-code-os, 2026-04-03)
  • How do you handle multi-person vaults or shared team vaults — does the pattern transfer to teams? (raised by: concepts/obsidian-claude-code-os, 2026-04-03)

Agent Harness (General Concept)

  • How does the co-evolution principle affect portability — can a harness designed for Claude work well with GPT or Gemini, or is tight model-harness coupling the norm? (raised by: concepts/agent-harness, 2026-04-16)
  • What is the empirical cost/benefit of multi-agent vs. single-agent for real production systems — where does the ~10 tool threshold come from? (raised by: concepts/agent-harness, 2026-04-16)
  • How does ACON's 26–54% token reduction technique work in detail, and is it implementable in existing frameworks? (raised by: concepts/agent-harness, 2026-04-16)
  • What does the "Ralph Loop" look like in non-coding domains — can the Initializer/Coding Agent pattern generalize to research, writing, or data analysis? (raised by: concepts/agent-harness, 2026-04-16)

Harness Engineering (OpenAI Codex)

  • What is the full set of "golden principles" the OpenAI Codex team uses — can they be generalized to other codebases? (raised by: concepts/harness-engineering, 2026-04-03)
  • How does architectural coherence evolve over years in a fully agent-generated system — does drift compound despite garbage collection? (raised by: concepts/harness-engineering, 2026-04-03)
  • What is the Aardvark agent (mentioned in the OpenAI article) and what role does it play in agent-first development? (raised by: concepts/harness-engineering, 2026-04-03)
  • At what team or codebase size does the harness engineering model break down — does it scale to 50 or 500 engineers? (raised by: concepts/harness-engineering, 2026-04-03)

Agentic Engineering

  • What is the best way for mid-career engineers to develop agent-direction skill quickly? (raised by: concepts/agentic-engineering, 2026-04-03)
  • How do agentic engineering patterns transfer to non-code knowledge work (legal, medical, editorial)? (raised by: concepts/agentic-engineering, 2026-04-03)
  • What does "code review" look like at scale when the reviewer cannot read every line? What signals substitute for line-by-line review? (raised by: concepts/agentic-engineering, 2026-04-03)

MCP Authentication

  • How does MCP authentication compare to tool-calling auth in other protocols (e.g., OpenAPI/tool-use with API keys)? (raised by: concepts/mcp-authentication, 2026-04-04)
  • Is Entra's pre-registration-only limitation permanent, or is DCR support planned? (raised by: concepts/mcp-authentication, 2026-04-04)
  • How are token refresh cycles handled in long-running MCP sessions — does FastMCP manage this automatically? (raised by: concepts/mcp-authentication, 2026-04-04)
  • What happens when the OBO exchange fails mid-session (token expiry, revoked consent) — how does the server signal errors to the MCP client? (raised by: concepts/mcp-authentication, 2026-04-04)
  • Are there performance implications of per-request JWT verification at scale, and what caching strategies help? (raised by: concepts/mcp-authentication, 2026-04-04)

Agentic Workflows

  • When does an evaluator-optimizer loop converge vs. oscillate? What stopping criteria work in practice? (raised by: concepts/agentic-workflows, 2026-04-09)
  • Is there a principled way to choose among the five workflow patterns for a new problem, or is it always empirical? (raised by: concepts/agentic-workflows, 2026-04-09)
  • How do agentic workflow patterns map to formal CS constructs (DAGs, state machines, recursive descent)? (raised by: concepts/agentic-workflows, 2026-04-09)

AI Red Teaming

  • How do you probe for dangerous LLM capabilities like persuasion, deception, and autonomous replication? (raised by: concepts/ai-red-teaming, 2026-04-09)
  • Can AI red teaming practices be standardized so organizations can clearly communicate methods and findings? (raised by: concepts/ai-red-teaming, 2026-04-09)
  • How do red teaming practices adapt to non-Western linguistic and cultural contexts at scale? (raised by: concepts/ai-red-teaming, 2026-04-09)
  • At what point does jailbreak cost rise to the level of buffer overflows — and what specific mitigations drive that transition? (raised by: concepts/ai-red-teaming, 2026-04-09)

Web Scraping

  • How does the four-tier system handle sites requiring login — is there a fifth tier for authenticated scraping? (raised by: guides/progressive-web-scraping, 2026-04-09)
  • How does Bright Data's pricing compare to alternatives (Apify, ScrapingBee, Oxylabs) at scale? (raised by: guides/progressive-web-scraping, 2026-04-09)
  • Is the PAI skill library actively maintained, and how does skill quality compare to official sources? (raised by: guides/progressive-web-scraping, 2026-04-09)

Frontier AI & Cyber Security

  • When will a frontier AI model complete the full 32-step enterprise attack scenario end-to-end, and what capability milestone drives it? (raised by: concepts/frontier-ai-cyber-capabilities, 2026-04-15)
  • How does AI-enabled attack cost (~£65/attempt) compare to traditional skilled-attacker costs — at what cost threshold does AI fundamentally change the economics of targeted attacks? (raised by: concepts/frontier-ai-cyber-capabilities, 2026-04-15)
  • What monitoring and detection tooling is most effective against AI-generated attack activity, given that current models generate noticeable alerts? (raised by: concepts/frontier-ai-cyber-capabilities, 2026-04-15)
  • As AI attack behavior becomes stealthier, how does the detectability advantage erode — and how quickly? (raised by: concepts/frontier-ai-cyber-capabilities, 2026-04-15)
  • How do the AISI evaluation results translate to real-world attacker operations — what is the gap between simulated scenario and live network exploitation? (raised by: concepts/frontier-ai-cyber-capabilities, 2026-04-15)
  • Does ICS attack performance track the enterprise network performance with a lag, and what does AI parity look like for critical infrastructure? (raised by: concepts/frontier-ai-cyber-capabilities, 2026-04-15)

LLM-Tier Security & Personal Computer Security

  • When will Mythos-tier exploit capability be available to malicious actors at low cost (~$1K), and what is the current price/capability trajectory? (raised by: concepts/llm-tier-security, 2026-04-21)
  • How will phone companies defend against increasingly AI-automated SIM-swap attacks? (raised by: concepts/llm-tier-security, 2026-04-21)
  • What are best-practice sandboxing setups for developer environments using npm/pip/cargo dependencies (container, VM, Flatpak, bubblewrap)? (raised by: concepts/llm-tier-security, 2026-04-21)
  • Will ACATS fraud protections standardize across major brokerages? Which currently offer account lock features? (raised by: concepts/llm-tier-security, 2026-04-21)
  • How does an outbound application firewall (OpenSnitch) interact with legitimate AI agents running locally on the same machine? (raised by: concepts/llm-tier-security, 2026-04-21)

Prompt Injection & Security

  • Is prompt injection fundamentally unsolvable, or is there a theoretical architecture that makes it provably safe? (raised by: concepts/prompt-injection, 2026-04-03)
  • When will the "Challenger disaster" of AI (a major high-profile prompt injection incident) occur, and what will it look like? (raised by: concepts/prompt-injection, 2026-04-03)
  • Has anyone deployed a production system using the CaMeL privileged/quarantined agent architecture? (raised by: concepts/prompt-injection, 2026-04-03)
  • How do prompt injection risks change as agents gain access to physical systems (robots, vehicles, IoT)? (raised by: concepts/prompt-injection, 2026-04-03)
  • Were there confirmed exploits of CVE-2026-33579 in the wild before the patch, given the two-day window between patch release and CVE publication? (raised by: concepts/openclaw-security, 2026-04-03)
  • Is the CaMeL privileged/quarantined agent architecture a viable path to making tools like OpenClaw safe by design? (raised by: concepts/openclaw-security, 2026-04-03)

AI Regulation & Policy

  • What specific provisions of SB24-205 triggered the DOJ's constitutional challenge — was it the Commerce Clause, Supremacy Clause preemption, or another doctrine? (raised by: concepts/ai-regulation, 2026-04-25)
  • Will the DOJ's action against Colorado deter other states from passing similar AI regulation bills, or will states continue to experiment? (raised by: concepts/ai-regulation, 2026-04-25)
  • Is there a federal AI law in progress that would provide clear preemption, or is the DOJ relying solely on existing constitutional doctrine? (raised by: concepts/ai-regulation, 2026-04-25)
  • How does the DOJ's state-preemption posture interact with the EU AI Act — does it increase pressure for federal AI legislation? (raised by: concepts/ai-regulation, 2026-04-25)
  • What happens to Colorado's existing algorithmic discrimination law — is it also vulnerable to DOJ challenge? (raised by: concepts/ai-regulation, 2026-04-25)
  • Does the DOJ's posture reflect a broader deregulatory stance toward AI, and is it consistent across administrations? (raised by: entities/harmeet-dhillon, 2026-04-25)

Multi-Agent Misalignment

  • Is there a principled way to choose between sub-agents vs. agent teams for a given problem, beyond the independence/dependence heuristic? (raised by: raw/articles/suryanshti-sub-agents-vs-agent-teams-2026.md, 2026-04-25)
  • How does the Claude Agent SDK's description-based routing compare to explicit orchestrator logic — which is more reliable in practice? (raised by: raw/articles/suryanshti-sub-agents-vs-agent-teams-2026.md, 2026-04-25)
  • Can context-based decomposition be formalized into a decision framework (e.g., shared state graph analysis), or is it always a judgment call? (raised by: raw/articles/suryanshti-sub-agents-vs-agent-teams-2026.md, 2026-04-25)
  • Is the state keeper agent pattern sufficient for larger organizations (10+, 50+ agents) or does it become a bottleneck? (raised by: concepts/multi-agent-misalignment, 2026-04-25)
  • Does narrative drift worsen with more agents, more information-passing rounds, or more tightly-scoped roles — and is there an empirical curve? (raised by: concepts/multi-agent-misalignment, 2026-04-25)
  • Are there multi-agent architectures that are structurally immune to narrative drift (e.g., shared memory, consensus protocols)? (raised by: concepts/multi-agent-misalignment, 2026-04-25)
  • How does narrative drift interact with adversarial prompting — could a bad actor exploit role fidelity to inject false institutional records? (raised by: concepts/multi-agent-misalignment, 2026-04-25)
  • Can the MAST vocabulary (inter-agent misalignment, reasoning-action mismatch, incomplete verification) be operationalized into automated detection? (raised by: concepts/multi-agent-misalignment, 2026-04-25)
  • Does the narrative drift problem scale differently with different organizational topologies (hierarchical vs. flat, synchronous vs. async)? (raised by: concepts/multi-agent-misalignment, 2026-04-25)
  • Is Vei open-source or publicly accessible, and what stack is it built on? (raised by: entities/vei, 2026-04-25)

Context Files (CLAUDE.md / AGENTS.md / DESIGN.md / SKILL.md)

  • Is there coordination between Anthropic, OpenAI, and Google Stitch to formalize a common context-file standard, or is convergence happening organically? (raised by: concepts/context-files, 2026-04-25)
  • How should projects handle conflicts between CLAUDE.md project context and DESIGN.md design rules — which takes precedence? (raised by: concepts/context-files, 2026-04-25)
  • Will DESIGN.md adoption spread beyond Google Stitch to other AI-assisted development tools (Cursor, Copilot, etc.)? (raised by: concepts/context-files, 2026-04-25)
  • Is Google Stitch a standalone Google product or part of a larger platform (Firebase, Vertex, Gemini)? (raised by: entities/google-stitch, 2026-04-25)
  • How do context files scale in very large repositories — is there a risk of context file bloat? (raised by: concepts/context-files, 2026-04-25)

Claude Code Best Practices

  • At what task complexity does the "break it down" principle need to be applied — is there a heuristic for when a task is too large for a single session? (raised by: concepts/claude-code, 2026-04-25)
  • How do custom slash commands interact with Claude Code skills — can they overlap, and which takes precedence? (raised by: concepts/claude-code, 2026-04-25)
  • What is the optimal CLAUDE.md length — at what point does it become too long to read on every session start? (raised by: concepts/claude-code, 2026-04-25)