From 945d4b4572879b4b33d15c9e6617791c85624e57 Mon Sep 17 00:00:00 2001
From: Tobi Lutke <tobi@shopify.com>
Date: Sun, 21 Dec 2025 13:10:35 -0400
Subject: [PATCH] Add 6 synthetic evaluation documents
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Topics covered:
- API design principles
- Startup fundraising memo
- Distributed systems overview
- Product launch retrospective
- Machine learning primer
- Remote work policy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 test/eval-docs/api-design-principles.md       |  73 ++++++++++
 .../eval-docs/distributed-systems-overview.md |  92 +++++++++++++
 test/eval-docs/machine-learning-primer.md     | 125 ++++++++++++++++++
 .../eval-docs/product-launch-retrospective.md |  77 +++++++++++
 test/eval-docs/remote-work-policy.md          | 123 +++++++++++++++++
 test/eval-docs/startup-fundraising-memo.md    |  86 ++++++++++++
 6 files changed, 576 insertions(+)
 create mode 100644 test/eval-docs/api-design-principles.md
 create mode 100644 test/eval-docs/distributed-systems-overview.md
 create mode 100644 test/eval-docs/machine-learning-primer.md
 create mode 100644 test/eval-docs/product-launch-retrospective.md
 create mode 100644 test/eval-docs/remote-work-policy.md
 create mode 100644 test/eval-docs/startup-fundraising-memo.md

diff --git a/test/eval-docs/api-design-principles.md b/test/eval-docs/api-design-principles.md
new file mode 100644
index 0000000..e628e4b
--- /dev/null
+++ b/test/eval-docs/api-design-principles.md
@@ -0,0 +1,73 @@
+# API Design Principles
+
+## Introduction
+
+Good API design is crucial for developer experience. This document outlines the core principles we follow when designing REST APIs.
+
+## Principle 1: Use Nouns, Not Verbs
+
+URLs should represent resources, not actions. Use HTTP methods to indicate the action.
+
+**Good:**
+- GET /users/123
+- POST /orders
+- DELETE /products/456
+
+**Bad:**
+- GET /getUser?id=123
+- POST /createOrder
+- GET /deleteProduct/456
+
+## Principle 2: Use Plural Nouns
+
+Always use plural nouns for consistency.
+
+- /users (not /user)
+- /orders (not /order)
+- /products (not /product)
+
+## Principle 3: Hierarchical Relationships
+
+Express relationships through URL hierarchy.
+
+- GET /users/123/orders - Get all orders for user 123
+- GET /users/123/orders/456 - Get specific order 456 for user 123
+
+## Principle 4: Filtering and Pagination
+
+Use query parameters for filtering, sorting, and pagination.
+
+- GET /products?category=electronics&sort=price&page=2&limit=20
+
+## Principle 5: Versioning
+
+Always version your APIs. We prefer URL versioning.
+
+- /v1/users
+- /v2/users
+
+## Principle 6: Error Handling
+
+Return consistent error responses with appropriate HTTP status codes.
+
+```json
+{
+  "error": {
+    "code": "VALIDATION_ERROR",
+    "message": "Email format is invalid",
+    "field": "email"
+  }
+}
+```
+
+## Principle 7: Rate Limiting
+
+Implement rate limiting and communicate limits via headers:
+
+- X-RateLimit-Limit: 1000
+- X-RateLimit-Remaining: 999
+- X-RateLimit-Reset: 1640000000
+
+## Conclusion
+
+Following these principles leads to APIs that are intuitive, consistent, and easy to maintain. Remember: the best API is one that developers can use without reading documentation.
diff --git a/test/eval-docs/distributed-systems-overview.md b/test/eval-docs/distributed-systems-overview.md
new file mode 100644
index 0000000..b2073dd
--- /dev/null
+++ b/test/eval-docs/distributed-systems-overview.md
@@ -0,0 +1,92 @@
+# Distributed Systems: A Practical Overview
+
+## What Makes a System "Distributed"?
+
+A distributed system is a collection of independent computers that appears to users as a single coherent system. The key challenges arise from:
+
+1. **Partial failure** - Parts of the system can fail independently
+2. **Unreliable networks** - Messages can be lost, delayed, or duplicated
+3. **No global clock** - Different nodes have different views of time
+
+## The CAP Theorem
+
+Eric Brewer's CAP theorem states that a distributed system can only provide two of three guarantees:
+
+- **Consistency**: All nodes see the same data at the same time
+- **Availability**: Every request receives a response
+- **Partition tolerance**: System continues operating despite network partitions
+
+In practice, network partitions happen, so you're really choosing between CP and AP systems.
+
+### CP Systems (Consistency + Partition Tolerance)
+- Examples: ZooKeeper, etcd, Consul
+- Sacrifice availability during partitions
+- Good for: coordination, leader election, configuration
+
+### AP Systems (Availability + Partition Tolerance)
+- Examples: Cassandra, DynamoDB, CouchDB
+- Sacrifice consistency during partitions
+- Good for: high-throughput, always-on services
+
+## Consensus Algorithms
+
+When nodes need to agree on something, they use consensus algorithms.
+
+### Paxos
+- Original consensus algorithm by Leslie Lamport
+- Notoriously difficult to understand and implement
+- Foundation for many other algorithms
+
+### Raft
+- Designed to be understandable
+- Used in etcd, Consul, CockroachDB
+- Separates leader election from log replication
+
+### PBFT (Practical Byzantine Fault Tolerance)
+- Handles malicious nodes
+- Used in blockchain systems
+- Higher overhead than crash-fault-tolerant algorithms
+
+## Replication Strategies
+
+### Single-Leader Replication
+- One node accepts writes
+- Followers replicate from leader
+- Simple but leader is bottleneck
+
+### Multi-Leader Replication
+- Multiple nodes accept writes
+- Must handle write conflicts
+- Good for multi-datacenter deployments
+
+### Leaderless Replication
+- Any node accepts writes
+- Uses quorum reads/writes
+- Examples: Dynamo-style databases
+
+## Consistency Models
+
+From strongest to weakest:
+
+1. **Linearizability** - Operations appear instantaneous
+2. **Sequential consistency** - Operations appear in some sequential order
+3. **Causal consistency** - Causally related operations appear in order
+4. **Eventual consistency** - Given enough time, all replicas converge
+
+## Partitioning (Sharding)
+
+Distributing data across nodes:
+
+### Hash Partitioning
+- Hash key to determine partition
+- Even distribution
+- Range queries are inefficient
+
+### Range Partitioning
+- Ranges of keys on different nodes
+- Good for range queries
+- Risk of hot spots
+
+## Conclusion
+
+Building distributed systems requires understanding these fundamental concepts. Start simple, add complexity only when needed, and always plan for failure.
diff --git a/test/eval-docs/machine-learning-primer.md b/test/eval-docs/machine-learning-primer.md
new file mode 100644
index 0000000..912a6a9
--- /dev/null
+++ b/test/eval-docs/machine-learning-primer.md
@@ -0,0 +1,125 @@
+# Machine Learning: A Beginner's Guide
+
+## What is Machine Learning?
+
+Machine learning is a subset of artificial intelligence where systems learn patterns from data rather than being explicitly programmed. Instead of writing rules, you provide examples and let the algorithm discover the rules.
+
+## Types of Machine Learning
+
+### Supervised Learning
+
+The algorithm learns from labeled examples.
+
+**Classification**: Predicting categories
+- Email spam detection
+- Image recognition
+- Medical diagnosis
+
+**Regression**: Predicting continuous values
+- House price prediction
+- Stock price forecasting
+- Temperature prediction
+
+Common algorithms:
+- Linear Regression
+- Logistic Regression
+- Decision Trees
+- Random Forests
+- Support Vector Machines (SVM)
+- Neural Networks
+
+### Unsupervised Learning
+
+The algorithm finds patterns in unlabeled data.
+
+**Clustering**: Grouping similar items
+- Customer segmentation
+- Document categorization
+- Anomaly detection
+
+**Dimensionality Reduction**: Simplifying data
+- Feature extraction
+- Visualization
+- Noise reduction
+
+Common algorithms:
+- K-Means Clustering
+- Hierarchical Clustering
+- Principal Component Analysis (PCA)
+- t-SNE
+
+### Reinforcement Learning
+
+The algorithm learns through trial and error, receiving rewards or penalties.
+
+Applications:
+- Game playing (AlphaGo, chess)
+- Robotics
+- Autonomous vehicles
+- Resource management
+
+## The Machine Learning Pipeline
+
+1. **Data Collection**: Gather relevant data
+2. **Data Cleaning**: Handle missing values, outliers
+3. **Feature Engineering**: Create useful features
+4. **Model Selection**: Choose appropriate algorithm
+5. **Training**: Fit model to training data
+6. **Evaluation**: Test on held-out data
+7. **Deployment**: Put model into production
+8. **Monitoring**: Track performance over time
+
+## Key Concepts
+
+### Overfitting vs Underfitting
+
+**Overfitting**: Model memorizes training data, performs poorly on new data
+- Solution: More data, regularization, simpler model
+
+**Underfitting**: Model too simple to capture patterns
+- Solution: More features, complex model, less regularization
+
+### Train/Test Split
+
+Never evaluate on training data. Common splits:
+- 80% training, 20% testing
+- 70% training, 15% validation, 15% testing
+
+### Cross-Validation
+
+K-fold cross-validation provides more robust evaluation:
+1. Split data into K folds
+2. Train on K-1 folds, test on remaining fold
+3. Repeat K times
+4. Average the results
+
+### Bias-Variance Tradeoff
+
+- **High Bias**: Oversimplified model (underfitting)
+- **High Variance**: Overcomplicated model (overfitting)
+- Goal: Find the sweet spot
+
+## Evaluation Metrics
+
+### Classification
+- Accuracy: Correct predictions / Total predictions
+- Precision: True positives / Predicted positives
+- Recall: True positives / Actual positives
+- F1 Score: Harmonic mean of precision and recall
+- AUC-ROC: Area under receiver operating curve
+
+### Regression
+- Mean Absolute Error (MAE)
+- Mean Squared Error (MSE)
+- Root Mean Squared Error (RMSE)
+- R-squared (R2)
+
+## Getting Started
+
+1. Learn Python and libraries (NumPy, Pandas, Scikit-learn)
+2. Work through classic datasets (Iris, MNIST, Titanic)
+3. Take online courses (Coursera, fast.ai)
+4. Practice on Kaggle competitions
+5. Build projects with real-world data
+
+Remember: Machine learning is 80% data preparation and 20% modeling. Start with clean data and simple models before going complex.
diff --git a/test/eval-docs/product-launch-retrospective.md b/test/eval-docs/product-launch-retrospective.md
new file mode 100644
index 0000000..8d7d394
--- /dev/null
+++ b/test/eval-docs/product-launch-retrospective.md
@@ -0,0 +1,77 @@
+# Product Launch Retrospective: Project Phoenix
+
+**Date:** November 2024
+**Facilitator:** Product Team
+**Attendees:** Engineering, Design, Marketing, Sales
+
+## Context
+
+Project Phoenix was our Q3 initiative to launch a new analytics dashboard. The feature shipped on September 15th after a 4-month development cycle.
+
+## What Went Well
+
+### 1. Cross-functional Collaboration
+The weekly sync between engineering, design, and product prevented misalignment. Design reviews caught issues early, saving significant rework.
+
+### 2. Beta Program
+Our 20-customer beta program identified 47 bugs before launch. Customer feedback directly shaped the final UI.
+
+### 3. Documentation
+Engineering wrote comprehensive API docs. The developer portal received positive feedback from partners.
+
+### 4. Launch Metrics
+- Day 1 adoption: 34% of active users
+- Week 1 retention: 67%
+- NPS from early users: +42
+
+## What Could Have Gone Better
+
+### 1. Timeline Pressure
+The original June deadline was unrealistic. We cut corners on test coverage (only 62% vs. our 80% target).
+
+### 2. Performance Issues
+Initial load time was 4.2 seconds. We had to hotfix performance optimizations in week 2.
+
+### 3. Mobile Experience
+Mobile was deprioritized. The responsive design has usability issues on smaller screens.
+
+### 4. Sales Enablement
+Sales team wasn't trained until launch day. Early deals had inconsistent positioning.
+
+## Key Metrics Post-Launch
+
+| Metric | Target | Actual | Status |
+|--------|--------|--------|--------|
+| MAU | 10,000 | 12,400 | Exceeded |
+| Avg Session Duration | 5 min | 7.2 min | Exceeded |
+| Error Rate | <0.1% | 0.3% | Missed |
+| Support Tickets | <50/week | 73/week | Missed |
+
+## Action Items
+
+1. **Testing**: Establish minimum 75% coverage for all new features
+   - Owner: Engineering Lead
+   - Due: December 1st
+
+2. **Performance Budget**: Add performance gates to CI/CD
+   - Owner: Platform Team
+   - Due: December 15th
+
+3. **Mobile-First**: Require mobile designs before development starts
+   - Owner: Design Lead
+   - Due: Immediate
+
+4. **Sales Training**: Build 2-week lead time for enablement
+   - Owner: Product Marketing
+   - Due: Next launch
+
+## Lessons Learned
+
+1. Beta programs are invaluable - expand to 30+ customers
+2. Performance testing should be part of definition of done
+3. Cross-functional alignment works - keep the weekly syncs
+4. Documentation pays off - developers loved the API docs
+
+## Follow-up
+
+Schedule 30-day post-launch review for October 15th to assess long-term adoption patterns.
diff --git a/test/eval-docs/remote-work-policy.md b/test/eval-docs/remote-work-policy.md
new file mode 100644
index 0000000..b5c6f3d
--- /dev/null
+++ b/test/eval-docs/remote-work-policy.md
@@ -0,0 +1,123 @@
+# Remote Work Policy
+
+**Effective Date:** January 2024
+**Last Updated:** March 2024
+**Applies To:** All full-time employees
+
+## Overview
+
+We believe in flexibility and trust. This policy outlines expectations for remote work arrangements to ensure productivity while maintaining work-life balance.
+
+## Eligibility
+
+All full-time employees who have completed their 90-day probationary period are eligible for remote work. Some roles requiring physical presence (e.g., office management, hardware engineering) may have modified arrangements.
+
+## Work Arrangements
+
+### Fully Remote
+- Work from anywhere within approved time zones
+- No requirement to visit office
+- Must attend quarterly in-person gatherings
+
+### Hybrid
+- 2-3 days per week in office
+- Flexible scheduling with manager approval
+- Core collaboration days: Tuesday and Thursday
+
+### Office-Based
+- Primary work location is company office
+- Occasional remote days allowed with notice
+
+## Expectations
+
+### Availability
+- Be available during core hours: 10 AM - 3 PM local time
+- Respond to messages within 2 hours during work hours
+- Block focus time on calendar if needed
+
+### Communication
+- Camera on for team meetings
+- Update Slack status when away
+- Share working hours in calendar
+
+### Workspace Requirements
+- Reliable internet connection (minimum 25 Mbps)
+- Quiet space for video calls
+- Ergonomic setup (we provide $500 home office stipend)
+
+### Security
+- Use company VPN for all work
+- Lock computer when stepping away
+- No work on public WiFi without VPN
+- Report lost devices immediately
+
+## Equipment
+
+Company provides:
+- Laptop (MacBook Pro or equivalent)
+- Monitor (up to 27")
+- Keyboard and mouse
+- Headset for calls
+
+$500 annual stipend for:
+- Desk and chair
+- Lighting
+- Other ergonomic equipment
+
+## Time Zones
+
+### Approved Time Zones
+- Americas: UTC-8 to UTC-3
+- Europe: UTC-1 to UTC+3
+- Asia-Pacific: UTC+8 to UTC+12
+
+Work outside approved time zones requires VP approval.
+
+### Async First
+- Default to async communication
+- Document decisions in writing
+- Use Loom for complex explanations
+- Reserve meetings for collaboration, not status updates
+
+## International Remote Work
+
+Extended stays (>30 days) in another country require:
+1. HR approval
+2. Tax implications review
+3. Legal compliance check
+4. Updated work agreement
+
+Some countries are not permitted due to legal/tax complexity.
+
+## In-Person Requirements
+
+### Mandatory Events
+- Annual company retreat (1 week)
+- Quarterly team gatherings (2 days)
+- Onboarding week for new hires
+
+### Travel
+- Company covers all travel expenses
+- Book through approved travel platform
+- Submit expenses within 30 days
+
+## Performance
+
+Remote work is a privilege maintained through:
+- Meeting deadlines and commitments
+- Responsive communication
+- Quality of work output
+- Team collaboration
+
+Performance issues may result in modified arrangements.
+
+## Manager Responsibilities
+
+- Regular 1:1 meetings (weekly recommended)
+- Clear goal setting and feedback
+- Inclusive meeting scheduling across time zones
+- Address isolation or burnout proactively
+
+## Questions
+
+Contact HR or your manager for questions about this policy. We review and update this policy annually based on feedback.
diff --git a/test/eval-docs/startup-fundraising-memo.md b/test/eval-docs/startup-fundraising-memo.md
new file mode 100644
index 0000000..c334f3c
--- /dev/null
+++ b/test/eval-docs/startup-fundraising-memo.md
@@ -0,0 +1,86 @@
+# Series A Fundraising Strategy Memo
+
+**To:** Leadership Team
+**From:** CEO
+**Date:** March 2024
+**Subject:** Series A Planning and Timeline
+
+## Executive Summary
+
+We are targeting a $15M Series A raise at a $60M pre-money valuation. This memo outlines our strategy, timeline, and key milestones.
+
+## Current Metrics
+
+- ARR: $2.4M (growing 15% MoM)
+- Customers: 127 paying companies
+- Net Revenue Retention: 124%
+- Burn Rate: $350K/month
+- Runway: 14 months
+
+## Target Investors
+
+We're focusing on three tiers:
+
+### Tier 1 (Lead candidates)
+- Sequoia Capital - Strong enterprise SaaS focus
+- Andreessen Horowitz - Previous interest from partner
+- Index Ventures - European expansion thesis fits
+
+### Tier 2 (Co-investors)
+- First Round Capital
+- Founder Collective
+- SV Angel
+
+### Tier 3 (Strategic)
+- Salesforce Ventures
+- Google Ventures
+
+## Timeline
+
+**April 2024**
+- Finalize data room
+- Update financial model
+- Prepare pitch deck v2
+
+**May 2024**
+- Warm introductions begin
+- First partner meetings
+- Initial term sheets expected
+
+**June 2024**
+- Partner meetings continue
+- Negotiate terms
+- Select lead investor
+
+**July 2024**
+- Due diligence
+- Legal documentation
+- Close round
+
+## Use of Funds
+
+- Engineering (50%): Scale team from 8 to 20
+- Sales (30%): Build outbound motion, hire 5 AEs
+- Marketing (15%): Brand, content, events
+- G&A (5%): Operations infrastructure
+
+## Key Risks
+
+1. **Market timing** - Enterprise budgets tightening
+2. **Competition** - Two well-funded competitors announced
+3. **Valuation expectations** - Market multiples compressed
+
+## Board Composition
+
+Post-Series A board will be:
+- 2 Founders
+- 1 Lead investor
+- 1 Independent (to be recruited)
+
+## Next Steps
+
+1. Schedule strategy session for April 5th
+2. CFO to update financial model by April 10th
+3. Begin investor outreach April 15th
+
+Questions? Let's discuss at our next all-hands.