AI coding assistants like Claude Code and GitHub Copilot are remarkably capable — they can refactor entire modules, debug complex issues, and generate test suites on demand. However, there is an inconsistency problem that every developer using these tools has experienced. Ask an AI assistant to review a pull request on Monday and again on Friday, and the results will differ. One review catches the SQL injection vulnerability. The other misses it entirely. The AI is not getting worse between those two sessions — it is simply improvising every time.
This inconsistency reveals the gap between agents and skills. Agents represent the AI’s raw capability — autonomous, adaptive, and powerful. Skills are structured workflows that channel that capability into repeatable, reliable outcomes. Understanding when to use each — and how to build high-quality skills — is the difference between an AI assistant that sometimes helps and one that consistently delivers.
In the context of AI coding tools, an agent is an AI system that can autonomously plan, reason, and execute multi-step tasks. When a developer asks Claude Code to “add authentication to this API” or invokes Copilot’s Agent Mode to “refactor the payment module,” the agent does not simply generate a block of code and stop. It engages in a multi-phase process:
It reads the codebase to understand the existing architecture, dependencies, and patterns. It plans an approach — determining which files to modify, in what order, and how the changes relate to each other. It executes changes across multiple files, writing new code, modifying existing functions, and updating tests. It self-corrects when something goes wrong — reacting to test failures, lint errors, or type mismatches. It decides when the task is complete, evaluating whether the original request has been satisfied.
Four key characteristics define them:
Autonomous — they decide what steps to take without being told each one.
Adaptive — they adjust their approach based on what they discover in the codebase.
Context-aware — they leverage the full codebase, documentation, and conversation history.
Non-deterministic — the same prompt can produce different results on different runs.
This is what makes agents so powerful. Agents are indispensable for novel, complex work. Building a feature from scratch, investigating an unfamiliar bug, or exploring a codebase for the first time — these are tasks where the agent’s flexibility is precisely the point. No two situations are identical, and the agent’s ability to reason through ambiguity is what makes it valuable.
But that same flexibility becomes a liability for tasks that demand consistency. A code review should check for security vulnerabilities every time, not just when the AI happens to think of it. A deployment checklist should never skip steps. A debugging workflow should always start by reproducing the issue before jumping to a fix. This is where skills come in.
A skill is a structured, reusable workflow that guides an agent through a specific task the same way every time. The simplest analogy: an agent is a talented developer; a skill is the runbook that talented developer follows.
Rather than describe skills in the abstract, consider what one actually looks like:
---
name: code-review
description: Use when reviewing pull requests or code changes
---
# Code Review
## Checklist
Complete these steps in order:
1. Check for security vulnerabilities — SQL injection, XSS, auth bypass
2. Verify error handling — are edge cases covered?
3. Assess performance — N+1 queries, unnecessary allocations
4. Review readability — naming, structure, comments where needed
5. Run tests — confirm existing tests pass, suggest new ones
## Red Flags
If any of these are found, flag immediately before continuing:
- Hardcoded secrets or credentials
- Direct SQL string concatenation
- Missing input validation on public endpoints
This is a real skill file. It is just markdown — readable by humans, executable by AI. The frontmatter at the top tells the tool what the skill is called and when to activate it. The checklist ensures the agent covers every critical category in a specific order. The red flags section adds guardrails that interrupt the normal flow when something urgent is found. No code, no APIs, no complex tooling — just structured instructions that make the AI’s behavior repeatable.
Four key characteristics distinguish skills from raw agent behavior:
Deterministic — same task, same process, every time.
Composable — skills can invoke other skills, creating chains of workflows (e.g., a deployment skill that first invokes a testing skill).
Portable — a skill defined in one project can be shared across teams, repositories, or even tools. The SKILL.md format is an open standard (published at agentskills.io), which means a skill written for Claude Code works in GitHub Copilot, OpenAI Codex CLI, Google Gemini CLI, and over thirty other adopters without modification.
Auditable — anyone on the team can read exactly what the AI will do before it does it.
The key insight is this: skills do not replace agents — they enhance them. The agent’s reasoning ability is still doing the heavy lifting. It still needs to understand the code, interpret what it finds, and make judgment calls. The skill simply ensures that reasoning is applied consistently to the things that matter.
The choice between raw agent behavior and skill-guided behavior is not either/or — it is a question of what the task demands.
|
|
Agent (Raw) |
Skill-Guided Agent |
|
Best for |
Novel, exploratory work |
Repeatable, quality-critical tasks |
|
Consistency |
Varies by run |
Same process every time |
|
Setup cost |
Zero — just prompt |
Upfront: define the workflow |
|
Flexibility |
Maximum |
Constrained by design |
|
Auditability |
Inspect after the fact |
Inspect the skill before it runs |
|
Examples |
Build a new feature, investigate an unfamiliar bug, explore a codebase |
Code reviews, TDD, deployments, debugging workflows, onboarding checklists |
A simple decision heuristic makes this practical:
Will you do this task more than once? Use a skill. Does quality depend on not skipping steps? Use a skill. Is the task novel and heavily context-dependent? Use an agent. Do you need multiple people (or multiple AI sessions) to do it the same way? Use a skill.
In practice, most real work uses both. An agent executes the task; a skill ensures it does so thoroughly. The skill does not make the agent less capable — it makes the agent consistently capable.
Understanding the concept is the first step. Building a skill that actually works — one that improves your workflow rather than adding overhead — requires understanding the anatomy of a well-designed skill and the principles that separate useful skills from decorative ones.
Every effective skill shares three structural components. The SKILL.md format is now an open standard — published by Anthropic in December 2025 at agentskills.io and adopted by over thirty tools within months, including Claude Code, GitHub Copilot, OpenAI Codex CLI, Google Gemini CLI, JetBrains Junie, AWS Kiro, and Cursor. Regardless of which tool you use, every skill is a SKILL.md file with YAML frontmatter stored in its own subdirectory:
Frontmatter / Metadata — This tells the tool what the skill is and when to use it. The YAML frontmatter requires a name field (lowercase, hyphens for spaces) and a description field. The description is particularly important — it controls when the tool determines the skill is relevant and injects it into the agent’s context. An optional license field documents the skill’s licensing terms.
The Process — This is the ordered sequence of steps the agent must follow. A well-designed process has three properties: it is sequential (steps build on each other), exhaustive (nothing important is left to chance), and verifiable (each step produces an observable output that confirms it was completed).
Guardrails — These define what the agent must not skip, must not do, or must flag immediately. Guardrails are the “rigid” parts of a skill — the non-negotiable rules that enforce discipline even when the agent might otherwise take shortcuts.
In Claude Code, a skill is a markdown file with YAML frontmatter. The description field is particularly important — it controls when the tool suggests the skill to the user. A well-written description acts as a trigger condition.
---
name: code-review
description: Use when reviewing pull requests, completed features,
or code changes before merging. Guides a thorough, consistent review.
---
# Code Review
Review code changes systematically. Follow each phase in order.
## Checklist
1. **Understand the change** — Read the diff, identify the intent
2. **Check for security issues** — injection, auth bypass, XSS, secrets
3. **Verify error handling** — edge cases, failure modes, graceful degradation
4. **Assess performance** — N+1 queries, unnecessary allocations, blocking calls
5. **Evaluate readability** — naming, structure, dead code, comments
6. **Validate tests** — existing tests pass, new tests cover the change
7. **Summarize findings** — categorize as critical / warning / suggestion
## Red Flags
Stop and flag immediately if found:
- Hardcoded secrets or API keys
- SQL string concatenation (injection risk)
- Missing authentication checks on new endpoints
- Tests disabled or skipped without explanation
## Key Principles
- Review the code, not the developer
- Every finding needs a “why” — not just “this is wrong”
- Suggest fixes, not just problems
When a developer invokes this skill (via the /skill-name slash command or by describing a task that matches the skill’s description), Claude Code loads the full markdown content and follows it as structured guidance. The agent still reasons about the code — it does not mechanically execute a script — but it does so within the framework the skill defines. Every review hits the same categories in the same order.
Claude Code also supports several advanced skill features. Skills can use dynamic context injection — embedding live command output directly into the skill with the !`command` syntax, so a skill can automatically include the current git diff or test results before the agent even begins reasoning. Additional YAML frontmatter fields like allowed-tools (to pre-approve which tools the skill may use), disable-model-invocation (to prevent automatic triggering), and user-invocable (to control slash-command visibility) give authors fine-grained control over how a skill behaves. For complex skills, supporting files — scripts, templates, examples, and reference documentation — can be organized into subdirectories alongside the SKILL.md and are automatically discovered when the skill loads.
GitHub Copilot uses the same SKILL.md format as Claude Code, implementing the Agent Skills open standard. Skills are folders of instructions, scripts, and resources stored in a recognized skills directory. Copilot supports multiple directory conventions — .github/skills/, .claude/skills/, or .agents/skills/ within a repository for project skills, and ~/.copilot/skills/ or ~/.agents/skills/ in the home directory for personal skills shared across projects.
Each skill lives in its own subdirectory (e.g., .github/skills/code-review/) and contains a SKILL.md file with the same YAML frontmatter and markdown body structure used across the ecosystem.
---
name: code-review
description: Use when reviewing pull requests, completed features,
or code changes before merging. Guides a thorough, consistent review.
allowed-tools:
- editFiles
- runTerminalCommand
---
# Code Review
Review code changes systematically. Follow each phase in order.
1. **Understand the change** — Read the diff, identify the intent
2. **Check for security issues** — injection, auth bypass, XSS, secrets
3. **Verify error handling** — edge cases, failure modes, graceful degradation
4. **Assess performance** — N+1 queries, unnecessary allocations, blocking calls
5. **Evaluate readability** — naming, structure, dead code, comments
6. **Validate tests** — existing tests pass, new tests cover the change
7. **Summarize findings** — categorize as critical / warning / suggestion
## Red Flags
Stop and flag immediately if found:
- Hardcoded secrets or API keys
- SQL string concatenation (injection risk)
- Missing auth checks on new endpoints
- Disabled or skipped tests without explanation
For each finding, explain WHY it is an issue and suggest a specific fix.
When Copilot encounters a task that matches a skill’s description, it injects the SKILL.md content into the agent’s context — the same progressive loading approach used by Claude Code. The agent first reads only the name and description of each available skill; the full instructions load only when the skill is triggered. Skills can also include scripts and supporting resources in the skill directory; Copilot automatically discovers all files in the directory and makes them available alongside the instructions.
Skills work across all of Copilot’s surfaces: VS Code (in both chat and agent mode), the GitHub Copilot CLI, and the Copilot cloud agent for automated coding tasks. In VS Code, skills appear in the / menu and can be invoked directly with a slash command (e.g., /code-review). The GitHub CLI also provides a dedicated gh skill command for discovering, installing, updating, and publishing skills from GitHub repositories — making it easy to browse community-contributed skills (such as those in the github/awesome-copilot collection) and install them with a single command. The allowed-tools frontmatter field, shown in the example above, lets skill authors pre-approve which tools Copilot may use without prompting the user for permission each time.
Note that Copilot also supports a separate, simpler concept called reusable prompt files (stored in .github/prompts/ and invoked manually with #prompt:name). Prompt files are useful for one-shot templates, but skills are the richer, auto-triggered system designed for repeatable workflows.
Whether building for Claude Code, Copilot, or any other tool that supports the Agent Skills standard, these five principles separate skills that genuinely improve workflows from those that just add noise:
Ordered, not random. Steps should build on each other in a deliberate sequence. Security checks come before readability feedback, because a beautifully formatted SQL injection is still a vulnerability. The order communicates priority.
Exhaustive where it matters. Do not leave critical checks to the AI’s discretion. If security is important, list the specific vulnerability categories to examine — injection, XSS, authentication bypass, hardcoded secrets. Vague instructions produce vague results.
Red flags stop the flow. Some findings are urgent enough to interrupt the normal process. A hardcoded API key should not wait until the “readability” phase to be flagged. Define these explicitly so the agent treats them as immediate priorities.
Actionable, not vague. “Check performance” is a weak instruction. “Check for N+1 queries, unnecessary memory allocations, and blocking I/O calls” is a strong one. The more specific the instruction, the more consistent the output.
Rigid where it counts, flexible where it does not. The checklist order and the categories to review are non-negotiable — these are the parts that make the skill reliable. How the AI phrases its feedback, how much detail it provides for each finding, how it organizes the summary — these can flex based on context. Knowing which parts to lock down and which to leave open is what makes a skill practical rather than brittle.
One practical guideline: keep the SKILL.md body concise — ideally under 2,000 words. Once a skill loads, its content stays in the agent’s context across turns, so every line is a recurring token cost. For skills that need extensive reference material, move the detailed content into a references/ subdirectory and link to it from the SKILL.md. The agent will load those files on demand, only when the task requires them.
The AI coding assistant is not the problem — the lack of structure is. Agents provide raw power: the ability to reason, plan, and execute across complex codebases. Skills provide consistency: the assurance that critical tasks are handled the same way every time. Together, they move an AI-assisted workflow from improvisation to orchestration.
The emergence of Agent Skills as an open standard has accelerated this shift. A skill written for one tool now works across the entire ecosystem — Claude Code, GitHub Copilot, OpenAI Codex CLI, Google Gemini CLI, and dozens more — making skills not just portable across projects and teams, but across the tools those teams choose. Marketplaces like skills.sh have made it possible to discover, install, and share skills with a single command, and the ecosystem now numbers tens of thousands of community-contributed skills.
As AI coding tools continue to mature, the developers who get the most value from them will not simply be the ones who write better prompts. They will be the ones who encode their team’s best practices into skills — structured, portable, auditable workflows that scale across projects, people, and platforms. The prompt gets the AI’s attention. The skill earns its reliability.
For more about how Spyglass MTG can help your team get the most from AI development tools, contact us today.