Open Questions¶

Unresolved questions and research leads. Format: - [ ] {question} *(raised by: source/article, date)*

AI Tool Selection¶

How rapidly are frontier model rankings changing — is the GPT-5.2 / Claude Opus 4.6 / Gemini 3 Pro comparison still accurate? (raised by: guides/ai-tool-selection, 2026-04-07)
What is "Claude Cowork" precisely — a separate product, a Claude mode, or a desktop app? When was it launched? (raised by: guides/ai-tool-selection, 2026-04-07)
Does Mollick's "Harnesses" concept map to the same construct as harness engineering in the OpenAI Codex team's usage? (answered: yes — the agent harness concept was formalized across Anthropic, OpenAI, and LangChain in early 2026; see concepts/agent-harness, 2026-04-16)
How does NotebookLM's knowledge base quality compare to the LLM knowledge base pattern for research synthesis? (raised by: guides/ai-tool-selection, 2026-04-07)

AI Tools & Adoption¶

What database technology and connectors were used to aggregate the multi-app data pipeline (Monday.com, GoHighLevel, Zoom, Gmail, Slack, Google Calendar)? (raised by: concepts/ai-for-small-business, 2026-04-03)
What specific compliance or privacy frameworks apply to deciding what business data is safe to share with cloud AI? (raised by: concepts/ai-for-small-business, 2026-04-03)
When will on-premise/local LLM deployments become viable for small businesses with data privacy requirements? (raised by: concepts/ai-for-small-business, 2026-04-03)
How does the "dashboard" quality-control model evolve as model accuracy approaches 100%, and what does human work look like at that point? (raised by: concepts/ai-for-small-business, 2026-04-03)

Claude Code¶

What is the precise interface/workflow for Claude Code vs. standard Claude chat — separate product, mode, or tool? (answered by: concepts/claude-code-skills — Claude Code is a separate CLI/app with its own skills/commands system, distinct from the chat interface, 2026-04-07)
How does Claude Code compare to other AI coding tools (Cursor, GitHub Copilot, Devin) for non-programmer use cases? (raised by: concepts/claude-code, 2026-04-03)

Claude Code Skills¶

What is the actual content of the "Claude Agent Skills Explained" YouTube video — does it cover material beyond the official documentation? (answered: yes — adds 30–50 tokens/skill startup cost, explicit portability across Code/API/claude.ai, four-layer mental model, and multi-subagent skill sharing, 2026-04-07)
How widely have other AI tools (ChatGPT, Cursor, Gemini CLI) adopted the agentskills.io open standard? (raised by: concepts/claude-code-skills, 2026-04-07)
What is the practical limit on number of skills before description budget truncation becomes a problem in typical sessions? (raised by: concepts/claude-code-skills, 2026-04-07)
How do skills interact with CLAUDE.md — do skills override, extend, or co-exist with CLAUDE.md instructions? (answered: they co-exist; CLAUDE.md is the project foundation, skills are portable expertise layered on top — source: Claude_Agent_Skills_Explained transcript, 2026-04-07)

AI Inflection Point & Dark Factory¶

What does the quality gap look like between "most of the time works" and "all of the time works" — what's still missing? (raised by: concepts/ai-inflection-point, 2026-04-03)
How is the dark factory pattern adopted outside security-adjacent companies where testing is easier to simulate? (answered by: concepts/harness-engineering — OpenAI Codex team ran a general-purpose product with the same approach, 2026-04-03)
Will the prediction of 50% of engineers writing 95% AI code by end of 2026 materialize, and how do we measure it? (raised by: concepts/ai-inflection-point, 2026-04-03)
How does the inflection point expand to non-code knowledge work (law, medicine, journalism)? (raised by: concepts/ai-inflection-point, 2026-04-03)

Local Agent Stack¶

How does local model capability compare to frontier models for agentic tasks requiring complex multi-step reasoning? (raised by: guides/local-agent-stack, 2026-04-03)
What is the practical throughput ceiling for local agents — how many agent round-trips per hour on typical developer hardware? (raised by: guides/local-agent-stack, 2026-04-03)
How does LangGraph handle state persistence across sessions for long-running local agents? (raised by: guides/local-agent-stack, 2026-04-03)

LLM Knowledge Bases & PKM¶

At what scale (articles, words) does the index-based approach break down and require RAG or vector search? (raised by: concepts/llm-knowledge-base, 2026-04-03)
What does synthetic data generation + finetuning look like in practice for personal knowledge bases — what tools, what workflow? (raised by: concepts/llm-knowledge-base, 2026-04-03)
What product could formalize the "hacky collection of scripts" LLM knowledge base into a polished tool? (raised by: concepts/llm-knowledge-base, 2026-04-03)
How does the ephemeral wiki pattern (spawn, lint, report, discard) differ from persistent wiki maintenance — when is each appropriate? (raised by: concepts/llm-knowledge-base, 2026-04-03)
qmd: is it actively maintained, what are its exact system requirements (Node version, GGUF model size), and at what article count does it become worth the setup cost over a custom script? (raised by: concepts/llm-knowledge-base, 2026-04-08)
How do you define "actionable" when deciding which notes to turn into AI skills vs. leave as passive reference? (raised by: concepts/personal-knowledge-management, 2026-04-03)
What happens when AI skills built from different sources conflict — how do you reconcile competing frameworks? (raised by: concepts/personal-knowledge-management, 2026-04-03)
At what knowledge graph size does the system become unwieldy — is there an upper limit to useful PKM scale? (raised by: concepts/personal-knowledge-management, 2026-04-03)

Obsidian + Claude Code OS¶

What is the exact Obsidian CLI interface — is it an MCP server, a CLI tool, or an Obsidian plugin that exposes an API? (raised by: concepts/obsidian-claude-code-os, 2026-04-03) — Note: now that MCP is in the wiki (concepts/mcp-authentication), the MCP server hypothesis is worth investigating first
How do reference files stay up-to-date — is maintenance a manual discipline or can agents update them? (raised by: concepts/obsidian-claude-code-os, 2026-04-03)
At what vault size does the relationship graph become too large for Claude Code to traverse effectively? (raised by: concepts/obsidian-claude-code-os, 2026-04-03)
How do you handle multi-person vaults or shared team vaults — does the pattern transfer to teams? (raised by: concepts/obsidian-claude-code-os, 2026-04-03)

Agent Harness (General Concept)¶

How does the co-evolution principle affect portability — can a harness designed for Claude work well with GPT or Gemini, or is tight model-harness coupling the norm? (raised by: concepts/agent-harness, 2026-04-16)
What is the empirical cost/benefit of multi-agent vs. single-agent for real production systems — where does the ~10 tool threshold come from? (raised by: concepts/agent-harness, 2026-04-16)
How does ACON's 26–54% token reduction technique work in detail, and is it implementable in existing frameworks? (raised by: concepts/agent-harness, 2026-04-16)
What does the "Ralph Loop" look like in non-coding domains — can the Initializer/Coding Agent pattern generalize to research, writing, or data analysis? (raised by: concepts/agent-harness, 2026-04-16)

Harness Engineering (OpenAI Codex)¶

What is the full set of "golden principles" the OpenAI Codex team uses — can they be generalized to other codebases? (raised by: concepts/harness-engineering, 2026-04-03)
How does architectural coherence evolve over years in a fully agent-generated system — does drift compound despite garbage collection? (raised by: concepts/harness-engineering, 2026-04-03)
What is the Aardvark agent (mentioned in the OpenAI article) and what role does it play in agent-first development? (raised by: concepts/harness-engineering, 2026-04-03)
At what team or codebase size does the harness engineering model break down — does it scale to 50 or 500 engineers? (raised by: concepts/harness-engineering, 2026-04-03)

Agentic Engineering¶

What is the best way for mid-career engineers to develop agent-direction skill quickly? (raised by: concepts/agentic-engineering, 2026-04-03)
How do agentic engineering patterns transfer to non-code knowledge work (legal, medical, editorial)? (raised by: concepts/agentic-engineering, 2026-04-03)
What does "code review" look like at scale when the reviewer cannot read every line? What signals substitute for line-by-line review? (raised by: concepts/agentic-engineering, 2026-04-03)

MCP Authentication¶

How does MCP authentication compare to tool-calling auth in other protocols (e.g., OpenAPI/tool-use with API keys)? (raised by: concepts/mcp-authentication, 2026-04-04)
Is Entra's pre-registration-only limitation permanent, or is DCR support planned? (raised by: concepts/mcp-authentication, 2026-04-04)
How are token refresh cycles handled in long-running MCP sessions — does FastMCP manage this automatically? (raised by: concepts/mcp-authentication, 2026-04-04)
What happens when the OBO exchange fails mid-session (token expiry, revoked consent) — how does the server signal errors to the MCP client? (raised by: concepts/mcp-authentication, 2026-04-04)
Are there performance implications of per-request JWT verification at scale, and what caching strategies help? (raised by: concepts/mcp-authentication, 2026-04-04)

Agentic Workflows¶

When does an evaluator-optimizer loop converge vs. oscillate? What stopping criteria work in practice? (raised by: concepts/agentic-workflows, 2026-04-09)
Is there a principled way to choose among the five workflow patterns for a new problem, or is it always empirical? (raised by: concepts/agentic-workflows, 2026-04-09)
How do agentic workflow patterns map to formal CS constructs (DAGs, state machines, recursive descent)? (raised by: concepts/agentic-workflows, 2026-04-09)

AI Red Teaming¶

How do you probe for dangerous LLM capabilities like persuasion, deception, and autonomous replication? (raised by: concepts/ai-red-teaming, 2026-04-09)
Can AI red teaming practices be standardized so organizations can clearly communicate methods and findings? (raised by: concepts/ai-red-teaming, 2026-04-09)
How do red teaming practices adapt to non-Western linguistic and cultural contexts at scale? (raised by: concepts/ai-red-teaming, 2026-04-09)
At what point does jailbreak cost rise to the level of buffer overflows — and what specific mitigations drive that transition? (raised by: concepts/ai-red-teaming, 2026-04-09)

Web Scraping¶

How does the four-tier system handle sites requiring login — is there a fifth tier for authenticated scraping? (raised by: guides/progressive-web-scraping, 2026-04-09)
How does Bright Data's pricing compare to alternatives (Apify, ScrapingBee, Oxylabs) at scale? (raised by: guides/progressive-web-scraping, 2026-04-09)
Is the PAI skill library actively maintained, and how does skill quality compare to official sources? (raised by: guides/progressive-web-scraping, 2026-04-09)

Frontier AI & Cyber Security¶

When will a frontier AI model complete the full 32-step enterprise attack scenario end-to-end, and what capability milestone drives it? (raised by: concepts/frontier-ai-cyber-capabilities, 2026-04-15)
How does AI-enabled attack cost (~£65/attempt) compare to traditional skilled-attacker costs — at what cost threshold does AI fundamentally change the economics of targeted attacks? (raised by: concepts/frontier-ai-cyber-capabilities, 2026-04-15)
What monitoring and detection tooling is most effective against AI-generated attack activity, given that current models generate noticeable alerts? (raised by: concepts/frontier-ai-cyber-capabilities, 2026-04-15)
As AI attack behavior becomes stealthier, how does the detectability advantage erode — and how quickly? (raised by: concepts/frontier-ai-cyber-capabilities, 2026-04-15)
How do the AISI evaluation results translate to real-world attacker operations — what is the gap between simulated scenario and live network exploitation? (raised by: concepts/frontier-ai-cyber-capabilities, 2026-04-15)
Does ICS attack performance track the enterprise network performance with a lag, and what does AI parity look like for critical infrastructure? (raised by: concepts/frontier-ai-cyber-capabilities, 2026-04-15)

LLM-Tier Security & Personal Computer Security¶

When will Mythos-tier exploit capability be available to malicious actors at low cost (~$1K), and what is the current price/capability trajectory? (raised by: concepts/llm-tier-security, 2026-04-21)
How will phone companies defend against increasingly AI-automated SIM-swap attacks? (raised by: concepts/llm-tier-security, 2026-04-21)
What are best-practice sandboxing setups for developer environments using npm/pip/cargo dependencies (container, VM, Flatpak, bubblewrap)? (raised by: concepts/llm-tier-security, 2026-04-21)
Will ACATS fraud protections standardize across major brokerages? Which currently offer account lock features? (raised by: concepts/llm-tier-security, 2026-04-21)
How does an outbound application firewall (OpenSnitch) interact with legitimate AI agents running locally on the same machine? (raised by: concepts/llm-tier-security, 2026-04-21)

Prompt Injection & Security¶

Is prompt injection fundamentally unsolvable, or is there a theoretical architecture that makes it provably safe? (raised by: concepts/prompt-injection, 2026-04-03)
When will the "Challenger disaster" of AI (a major high-profile prompt injection incident) occur, and what will it look like? (raised by: concepts/prompt-injection, 2026-04-03)
Has anyone deployed a production system using the CaMeL privileged/quarantined agent architecture? (raised by: concepts/prompt-injection, 2026-04-03)
How do prompt injection risks change as agents gain access to physical systems (robots, vehicles, IoT)? (raised by: concepts/prompt-injection, 2026-04-03)
Were there confirmed exploits of CVE-2026-33579 in the wild before the patch, given the two-day window between patch release and CVE publication? (raised by: concepts/openclaw-security, 2026-04-03)
Is the CaMeL privileged/quarantined agent architecture a viable path to making tools like OpenClaw safe by design? (raised by: concepts/openclaw-security, 2026-04-03)

AI Regulation & Policy¶

What specific provisions of SB24-205 triggered the DOJ's constitutional challenge — was it the Commerce Clause, Supremacy Clause preemption, or another doctrine? (raised by: concepts/ai-regulation, 2026-04-25)
Will the DOJ's action against Colorado deter other states from passing similar AI regulation bills, or will states continue to experiment? (raised by: concepts/ai-regulation, 2026-04-25)
Is there a federal AI law in progress that would provide clear preemption, or is the DOJ relying solely on existing constitutional doctrine? (raised by: concepts/ai-regulation, 2026-04-25)
How does the DOJ's state-preemption posture interact with the EU AI Act — does it increase pressure for federal AI legislation? (raised by: concepts/ai-regulation, 2026-04-25)
What happens to Colorado's existing algorithmic discrimination law — is it also vulnerable to DOJ challenge? (raised by: concepts/ai-regulation, 2026-04-25)
Does the DOJ's posture reflect a broader deregulatory stance toward AI, and is it consistent across administrations? (raised by: entities/harmeet-dhillon, 2026-04-25)

Multi-Agent Misalignment¶

Context Files (CLAUDE.md / AGENTS.md / DESIGN.md / SKILL.md)¶

Is there coordination between Anthropic, OpenAI, and Google Stitch to formalize a common context-file standard, or is convergence happening organically? (raised by: concepts/context-files, 2026-04-25)
How should projects handle conflicts between CLAUDE.md project context and DESIGN.md design rules — which takes precedence? (raised by: concepts/context-files, 2026-04-25)
Will DESIGN.md adoption spread beyond Google Stitch to other AI-assisted development tools (Cursor, Copilot, etc.)? (raised by: concepts/context-files, 2026-04-25)
Is Google Stitch a standalone Google product or part of a larger platform (Firebase, Vertex, Gemini)? (raised by: entities/google-stitch, 2026-04-25)
How do context files scale in very large repositories — is there a risk of context file bloat? (raised by: concepts/context-files, 2026-04-25)

Claude Code Best Practices¶

At what task complexity does the "break it down" principle need to be applied — is there a heuristic for when a task is too large for a single session? (raised by: concepts/claude-code, 2026-04-25)
How do custom slash commands interact with Claude Code skills — can they overlap, and which takes precedence? (raised by: concepts/claude-code, 2026-04-25)
What is the optimal CLAUDE.md length — at what point does it become too long to read on every session start? (raised by: concepts/claude-code, 2026-04-25)

AI Labor Displacement & Workforce¶

Will the "permanent underclass" scenario materialize, or will new job categories emerge as in prior automation waves (David Autor camp)? (raised by: concepts/ai-labor-displacement, 2026-05-01)
When does the integration-adoption lag close — what triggers widespread employee AI adoption after employer mandates? (raised by: concepts/ai-labor-displacement, 2026-05-01)
Will OpenAI or Anthropic actually lobby for the redistributive policies (public wealth fund, jobs guarantee) their leaders discuss publicly? (raised by: concepts/ai-labor-displacement, 2026-05-01)
Is the 7.9 hours/week of AI tool friction a temporary adoption curve or a structural tax on tool proliferation? (raised by: concepts/ai-labor-displacement, 2026-05-01)
How does AI's hollowing-out of junior roles compound over time — does cutting the career ladder bottom reduce the senior talent pipeline a decade later? (raised by: concepts/ai-labor-displacement, 2026-05-01)
How will GDPVal benchmark performance translate to actual employer hiring decisions — at what win-rate threshold do firms stop hiring for a given role? (raised by: concepts/ai-labor-displacement, 2026-05-01)

Compiler Paradigm & Verification¶

What does a 'formal specification layer for agents' concretely look like? (raised by: concepts/compiler-analysis, 2026-05-05)
How do formal specification cultures in hardware inform what could work for software agent specs? (raised by: concepts/compiler-analysis, 2026-05-05)
What is the minimum viable verification layer for agents: AI-checks-AI as CI or something more structural? (raised by: concepts/compiler-analysis, 2026-05-05)
How does Philip Su's 'No More Code Reviews' differ from Venturini's framework? (raised by: concepts/compiler-analysis, 2026-05-05)
Has SkipLabs shipped products beyond blog content? (raised by: entities/skip-labs, 2026-05-05)
Does the acceleration curve from entities/andrew-ng (frontend > backend > infra > research) map to specific tooling for each tier? (raised by: analyses/coding-agent-acceleration-curve, 2026-05-06)
Does the acceleration curve stabilize as models improve, or does the relative ordering shift? (raised by: analyses/coding-agent-acceleration-curve, 2026-05-06)
Does Ng's acceleration gradient predict displacement risk across job categories — i.e., will frontend roles be disrupted first, infrastructure last? (raised by: analyses/coding-agent-acceleration-curve, 2026-05-06)

Agent Infrastructure Debt¶

Does Port's product address all seven infrastructure blocks, or only a subset? (raised by: concepts/agent-infrastructure-debt, 2026-05-28)
At what agent count does infrastructure spending transition from manageable to the 50% capacity figure cited — what is the intermediate curve? (raised by: concepts/agent-infrastructure-debt, 2026-05-28)
How do organizations track tokens/costs per agent in practice? The article acknowledges the need but offers no solutions. (raised by: concepts/agent-infrastructure-debt, 2026-05-28)
What is a workable definition of "agent" that distinguishes it from automation workflows? The boundary between agent and automation is acknowledged but not resolved. (raised by: concepts/agent-infrastructure-debt, 2026-05-28)
The article claims agents are hard to run in production — what does the data show? Specifically, what percentage of locally built agents survive the transition to production? (raised by: concepts/agent-infrastructure-debt, 2026-05-28)
Google's 2015 ML infrastructure debt paper was a call to action that led to large-scale ML platforms (Borg, Kubeflow, etc.). What comparable infrastructure emerged from the 2026 agent infrastructure debt pattern — has it been solved yet? (raised by: concepts/agent-infrastructure-debt, 2026-05-28)