AI-Driven Development with Speckit: Real-World Case Studies from Storyie

At Storyie, we've developed Speckit, a custom slash command framework for Claude Code that enforces spec-driven development. Rather than explaining the theory, this post showcases how we actually use Speckit through three real feature implementations from our codebase.

What is Speckit?

Speckit is a collection of eight Claude Code slash commands that form a complete development pipeline:

/speckit.specify  →  /speckit.clarify  →  /speckit.plan  →  /speckit.tasks  →  /speckit.implement

Each command produces persistent artifacts in a specs/ directory, creating documented trails of decisions and requirements. Let's see how this works in practice.

Case Study 1: Diary View Count Feature

Feature Branch: 006-diary-view-count

This feature tracks and displays view counts for public diary entries. Here's the actual user request that started it all:

"I want to display view counts for diaries on the web diary list and detail screens. Since mobile users can only view their own diaries, mobile views should be excluded from the count."

Step 1: Specification (/speckit.specify)

Running /speckit.specify transformed this Japanese request into a structured specification. Here's what was generated in specs/006-diary-view-count/spec.md:

# Feature Specification: Diary View Count Display

## User Story 1 - View Count Display in Diary List (Priority: P1)

As a diary author viewing my diary entries on the web application,
I want to see how many times each diary has been viewed
so that I can understand the total engagement in my content.

**Acceptance Scenarios**:

1. **Given** I am viewing the diary list on web,
   **When** I look at a diary entry,
   **Then** I see the total view count displayed alongside the diary

2. **Given** my diary has been viewed 5 times by other users on web,
   **When** I view the diary list,
   **Then** the view count shows "5"

The specification also identified edge cases that humans often miss:

### Edge Cases

- What happens when a user views their own diary on web?
  (Authors' own views are excluded)
- How does the system handle very high view counts (e.g., 1,000,000+)?
  (Display with abbreviated format like "1K", "1M")
- What happens if view tracking fails temporarily?
  (Counts are eventually consistent rather than guaranteed exact)

Step 2: Clarification (/speckit.clarify)

During clarification, Speckit identified three ambiguities that needed resolution:

| Question | Options | Decision |
|----------|---------|----------|
| Counting Method | A: Total views, B: Unique viewers | A: Total views |
| Anonymous Views | A: Count them, B: Exclude them | B: Exclude |
| Author Self-Views | A: Count them, B: Exclude them | B: Exclude |

Each decision was encoded back into the spec and tracked in a checklist at specs/006-diary-view-count/checklists/requirements.md:

## Validation Summary

**Status**: PASSED - All validation criteria met

**Key Decisions Made**:
1. Total view counting (Option A): All views counted, including repeats
2. Authenticated users only (Option B): Anonymous views excluded
3. Author exclusion (Option B): Authors' own views excluded

Step 3: Planning (/speckit.plan)

The plan phase produced technical research with concrete decisions. Here's an excerpt from specs/006-diary-view-count/research.md:

### 1. High-Performance View Logging without Foreign Keys

**Decision**: Use denormalized `diary_view_logs` table with UUID text fields

**Rationale**:
- User explicitly requested "avoid foreign keys as much as possible" for DB performance
- Foreign keys add overhead on INSERT operations
- View logging is high-frequency write operation (1,000-5,000 daily)

**Implementation Details**:
CREATE TABLE diary_view_logs (
  id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
  diary_id text NOT NULL,        -- UUID as text (no FK)
  viewer_id text NOT NULL,       -- UUID as text (no FK)
  viewed_at timestamptz NOT NULL DEFAULT now(),
  platform text NOT NULL DEFAULT 'web'
);

The research document also captured SST cron configuration decisions:

### 2. SST Cron Configuration for OpenNext

**Decision**: Use `sst.aws.Cron` resource with EventBridge schedule

**Implementation Pattern**:
new sst.aws.Cron("ViewAggregator", {
  schedule: "cron(0 */4 * * ? *)",  // Every 4 hours
  job: {
    handler: "apps/web/scripts/views/aggregate.handler",
    runtime: "nodejs22.x",
    timeout: "5 minutes",
  },
});

The plan included a Constitution Check - validating the design against our project principles:

## Constitution Check

- [x] **Spec-Driven Delivery**: Following /speckit.spec → /speckit.plan → /speckit.tasks
- [x] **Localization Parity**: 10 languages scoped (en, ja, zh, es, ar, hi, pt, ru, de, fr)
- [x] **Design System Fidelity**: Using existing constants/Colors.ts tokens
- [x] **Test-First Quality Gates**: Coverage target ≥80% documented
- [x] **Monorepo & Data Discipline**: Migrations with rollback plans documented

Step 4: Task Generation (/speckit.tasks)

The tasks file broke down the implementation into 75 concrete tasks across 7 phases. Here's a sample:

## Phase 3: User Story 3 - Automatic View Tracking

**Goal**: Implement server-side view logging that tracks authenticated
web user views while excluding mobile, anonymous, and author self-views

- [ ] T030 [P] [US3] Create unit tests for view logging service
      in apps/web/tests/services/viewService.test.ts
- [ ] T031 [P] [US3] Create integration tests for view logging
      in apps/web/tests/integration/viewLogging.test.ts
- [ ] T032 [US3] Create view logging service with logDiaryView()
      in apps/web/services/viewService.ts
- [ ] T033 [US3] Integrate view logging into diary detail page
      in apps/web/app/[slug]/page.tsx

Key metrics from this feature:

| Metric | Value |
|--------|-------|
| Total Tasks | 75 |
| Parallel Opportunities | 35 tasks |
| Phases | 7 |
| Languages Localized | 10 |

What We Learned

The view count feature revealed several patterns that made Speckit valuable:

Japanese input worked seamlessly - The spec generation handled mixed-language requirements
Performance constraints were captured early - "avoid foreign keys as much as possible" became a formal decision
Constitution Check caught localization requirements - All 10 languages were scoped from the start

Case Study 2: Floating Actions Redesign

Feature Branch: 005-diary-detail-floating-actions

This feature redesigned the mobile app's floating action buttons from a single button to a grouped layout with Share, Lock/Unlock, and Browser actions.

The Original Request

"I want to change the button in the bottom right corner of the Expo diary detail screen to a button layout like 'Share', 'Lock/Unlock', and 'Browser'."

Generated Specification Highlights

The specification captured precise acceptance criteria for each action:

### User Story 2 - Toggle Diary Privacy (Priority: P2)

**Acceptance Scenarios**:

1. **Given** a user is viewing a private diary entry (locked state),
   **When** they tap the lock button,
   **Then** the diary becomes public and the icon changes to unlock

2. **Given** a visibility toggle fails,
   **When** the error occurs,
   **Then** the visibility state reverts to the original setting
   and an error message is shown

Functional Requirements Generated

Speckit automatically derived 14 functional requirements from the user story:

### Functional Requirements

- **FR-001**: System MUST display three floating action buttons:
              Share, Lock/Unlock, and Browser
- **FR-003**: System MUST use a lock/key icon for privacy toggle,
              displaying locked icon when private, unlocked when public
- **FR-009**: System MUST provide optimistic UI updates for lock/unlock,
              reverting changes if the operation fails
- **FR-013**: System MUST update translations for all button accessibility
              labels across 10 supported languages

Success Criteria

The spec defined measurable outcomes without implementation details:

### Measurable Outcomes

- **SC-001**: Users can share diary entries to at least one destination
              in under 3 seconds
- **SC-002**: Privacy toggle completes in under 2 seconds with visual feedback
- **SC-005**: Error rate for visibility toggle is below 5%
              under normal network conditions

Edge Cases Discovered

The specification process surfaced edge cases not in the original request:

### Edge Cases

- What happens when a user tries to share a diary entry that has no content?
- How does the system handle visibility toggle when network is unavailable?
- What happens if the browser button is tapped but no default browser is configured?
- How does the system handle rapid successive taps on any floating action button?
- What happens when a user shares a private diary entry - is there a warning?

Case Study 3: Default Visibility Setting

Feature Branch: 004-diary-default-visibility

This feature allows users to set whether new diary entries default to public or private.

Complete Workflow Artifacts

This feature demonstrates the full artifact chain Speckit produces:

specs/004-diary-default-visibility/
├── spec.md           # Feature specification (5 user scenarios)
├── plan.md           # Technical architecture
├── research.md       # Decision documentation
├── data-model.md     # Entity definitions
├── tasks.md          # Implementation breakdown
├── quickstart.md     # Manual testing guide
├── contracts/
│   └── README.md     # API specifications
└── checklists/
    └── requirements.md  # Quality validation

Specification Quality Validation

Before planning begins, Speckit validates the specification quality:

## Content Quality

- [x] No implementation details (languages, frameworks, APIs)
- [x] Focused on user value and business needs
- [x] Written for non-technical stakeholders
- [x] All mandatory sections completed

## Requirement Completeness

- [x] Requirements are testable and unambiguous
- [x] Success criteria are measurable
- [x] Edge cases are identified
- [x] Dependencies and assumptions identified

Edge Cases Discovered

Speckit's clarification process surfaced edge cases the original request didn't mention:

### Edge Cases

- What happens when device storage is cleared or corrupted?
  (Should default to private)
- How does the system handle migration for existing users?
  (Should initialize with private as default)
- What happens if the user has multiple devices?
  (Each device stores its own preference independently)

Functional Requirements

From a simple request, Speckit derived 9 precise requirements:

### Functional Requirements

- **FR-001**: System MUST provide a settings option to select default visibility
- **FR-002**: System MUST persist the preference to device storage (not server-side)
- **FR-003**: System MUST initialize to "private" when no saved preference exists
- **FR-004**: System MUST apply the saved preference when creating new entries
- **FR-007**: System MUST fall back to "private" if storage read fails
- **FR-009**: System MUST update translations across all 10 supported languages

The Constitution: Governing Principles

All features are validated against our Project Constitution at .specify/memory/constitution.md. Here are our five core principles:

I. Spec-Driven Delivery

All feature work MUST progress through /speckit.specify → /speckit.plan → /speckit.tasks before any implementation PR opens.

II. Localization Parity

Every user-visible string MUST use translation keys and bundle updates to all ten locale files in the same change set.

III. Design System Fidelity

UI work MUST source colors, typography, and spacing from shared tokens. No raw hex values without design approval.

IV. Test-First Quality Gates

Each user story MUST add or update failing tests before implementation. Touched packages MUST maintain ≥80% coverage.

V. Monorepo & Data Discipline

Database changes MUST ship with migrations and regenerated types via pnpm type:gen.

Every /speckit.plan includes a Constitution Check that validates alignment with these principles before any code is written.

Artifacts Summary

Here's what our specs/ directory looks like after multiple features:

specs/
├── 001-fix-diary-autosave/
├── 001-optimize-public-pages/
├── 002-web/
├── 003-explore-1-20/
├── 004-diary-default-visibility/
├── 005-diary-detail-floating-actions/
├── 006-diary-view-count/
├── 007-diary-content-quality-seo/
├── 008-tech-blog/
└── 009-stripe-multitenancy/

Each directory contains the complete decision trail from idea to implementation.

Key Benefits We've Experienced

1. Comprehensive Edge Case Discovery

The view count feature identified 6 edge cases that would have caused production bugs:

Author self-view handling
Anonymous user exclusion
Rapid page refresh behavior
High view count formatting
Platform detection accuracy
Aggregation failure recovery

2. Measurable Requirements

Every feature has quantified success criteria:

| Feature | Metric | Target |
|---------|--------|--------|
| View Count | Display latency | <1 second |
| Floating Actions | Toggle completion | <2 seconds |
| Default Visibility | Setting save | <30 seconds |

3. Localization by Default

Constitution principle II ensures every feature includes translations for 10 languages from the start, not as an afterthought.

4. Traceable Decisions

When we need to understand why a feature works a certain way, the answer is in research.md:

### Platform Detection

**Decision**: Use User-Agent header detection in Next.js server-side code

**Rationale**:
- Expo apps send distinct User-Agent patterns (contains "Expo" or "ReactNative")
- Server-side detection prevents client-side bypass

**Alternatives Considered**:
- Custom header from client: Bypassable; rejected
- API endpoint separation: More complex routing; rejected

5. Parallel Task Identification

The tasks.md format identifies parallelizable work with [P] markers:

## Phase 2 (Foundational) - Parallel Opportunities

# All utility functions in parallel:
Task: T007, T008, T009

# All web locale files in parallel:
Task: T012, T013, T014, T015, T016, T017, T018, T019

The view count feature identified 35 tasks that could run concurrently.

Real Numbers from Our Features

| Feature | Tasks | Phases | Edge Cases | Languages |
|---------|-------|--------|------------|-----------|
| 006-diary-view-count | 75 | 7 | 6 | 10 |
| 005-floating-actions | 40+ | 5 | 5 | 10 |
| 004-default-visibility | 30+ | 4 | 4 | 10 |

Setting Up Speckit

To adopt Speckit in your project:

1. Create Command Files

.claude/commands/
├── speckit.specify.md
├── speckit.clarify.md
├── speckit.plan.md
├── speckit.tasks.md
├── speckit.analyze.md
├── speckit.checklist.md
├── speckit.implement.md
└── speckit.constitution.md

2. Create Templates

.specify/templates/
├── spec-template.md
├── plan-template.md
├── tasks-template.md
└── checklist-template.md

3. Initialize Constitution

Create .specify/memory/constitution.md with your project's core principles.

4. Create Helper Scripts

.specify/scripts/bash/
├── create-new-feature.sh
├── check-prerequisites.sh
└── update-agent-context.sh

Conclusion

Speckit transforms how we build features at Storyie. Every feature in our specs/ directory tells the complete story from user request to production code:

006-diary-view-count: 75 tasks, 7 phases, 10 languages, ≥80% test coverage
005-diary-detail-floating-actions: 14 functional requirements, 6 success criteria
004-diary-default-visibility: 9 functional requirements, complete edge case coverage

The key insight is that AI assistants excel when given structure. Speckit provides that structure through slash commands that mirror proven software engineering practices, ensuring every feature is:

Well-specified with measurable acceptance criteria
Validated against project constitution
Documented for future reference
Testable with comprehensive edge case coverage

What's Next

Want to explore more of our development practices? Check out these related posts:

Monorepo Architecture: Learn how we structure our pnpm workspace
Cross-Platform Lexical Editor: Dive into our platform-agnostic editor design
Database Schema Design: Explore our Drizzle ORM patterns

---

Have questions about Speckit? Share your thoughts on Twitter @StoryieApp or reach out to our engineering team.