The Ultimate Options Strategy Backtesting Checklist for 2026

You've built a strategy. The backtest results look great. But before you commit real capital, you need to verify that those results are trustworthy—not artifacts of overfitting, survivor bias, or unrealistic assumptions.

This checklist is your final quality gate. Run through every item before moving from backtest to live trading. A strategy that passes all 20 checks isn't guaranteed to succeed, but one that fails multiple checks is almost guaranteed to disappoint.

Print this out. Tape it to your monitor. Never skip a step.

---

The 20-Point Backtesting Checklist

Data Quality & Scope

#### 1. Sufficient Historical Data (5+ Years Minimum)

Your backtest must cover at least 5 years of data—ideally 10 or more. Anything less doesn't include enough market regime variety to be meaningful.

  • Why it matters: A 2-year backtest might only cover a bull market. That tells you nothing about how the strategy handles a downturn.
  • How to check: Verify your date range in the backtest configuration. OptionsPilot provides 30+ years of SPY/SPX data, so there's no excuse for a thin dataset.
  • Pass threshold: Minimum 5 years, recommended 10+.
  • #### 2. Includes Major Stress Events

    Your data range must include at least two major market crises. The non-negotiable events to capture:

  • 2008 Financial Crisis (SPY -57%, VIX 89)
  • 2020 COVID Crash (SPY -34% in 33 days, VIX 83)
  • 2022 Rate-Hike Bear (SPY -25%, sustained VIX elevation)
  • If your backtest only covers 2012–2019, you're testing in the longest bull market in history. The results will look amazing—and they'll be useless for predicting real-world performance.

  • How to check: Verify the date range includes at least 2008 and 2020. View the Data Explorer to confirm event coverage.
  • Pass threshold: Must include at least 2 major stress events.
  • #### 3. No Survivorship Bias in Underlying Selection

    If you're backtesting on a specific stock, ask: "Would I have selected this stock 10 years ago?" If you picked AAPL because you know it's been a winner, you've introduced survivorship bias.

  • How to avoid: Backtest on broad ETFs (SPY, QQQ) rather than individual stocks, or use a systematic stock selection process that could have been applied in the past.
  • Pass threshold: Underlying selection is systematic, not hindsight-based.
  • #### 4. Data Source Is Reliable

    Options data is notoriously messy. Bad data leads to bad backtests. Verify:

  • Bid-ask spreads are realistic (no $0.01 wide spreads on deep OTM options)
  • Greeks are calculated correctly (delta, gamma, theta, vega)
  • Corporate actions (splits, dividends) are properly adjusted
  • No missing data days or anomalous prices
  • OptionsPilot uses institutional-grade CBOE data to avoid these issues.

    ---

    Strategy Configuration

    #### 5. Test Multiple DTE Ranges

    Don't just test one DTE. Run the same strategy at 7, 14, 30, 45, and 60 DTE. The results will often surprise you—the "obvious" choice isn't always optimal.

  • How to check: Run at least 3 DTE variations on OptionsPilot and compare Sharpe ratios, not just raw returns.
  • Pass threshold: Tested at least 3 DTE values; selected DTE has the best risk-adjusted return (not just the highest raw return).
  • #### 6. Test Multiple Delta Selections

    Similarly, test your strategy at multiple delta targets. For premium selling, test at least 0.10, 0.20, and 0.30. For buying strategies, test 0.30, 0.40, and 0.50.

  • Pass threshold: Tested at least 3 delta values; performance is robust across the range (not fragile—if changing delta from 0.28 to 0.32 flips the strategy from profitable to unprofitable, that's a red flag).
  • #### 7. Realistic Entry and Exit Assumptions

    Your backtest must use realistic fill assumptions:

  • Entry: Mid-price minus a slippage allowance (typically $0.02–$0.05 for SPY options)
  • Exit: Mid-price minus slippage
  • No fills at unrealistic prices: If VIX is at 80 and your backtest shows a fill at the mid-price with no slippage, that's not realistic.
  • Pass threshold: Slippage is modeled at $0.02+ per leg for liquid underlyings, $0.05+ for less liquid names.
  • #### 8. Account for Bid-Ask Spread Impact

    The bid-ask spread is a hidden tax on every trade. For multi-leg strategies (iron condors, butterflies), the spread impact is multiplied:

    | Strategy | Legs | Typical SPY Spread Cost | Covered call1 option leg$2–$5 per trade Vertical spread2 legs$4–$10 per trade Iron condor4 legs$8–$20 per trade | Butterfly | 4 legs | $8–$20 per trade |

    If your strategy generates $15 per trade on average and the spread cost is $12, your real edge is only $3—a fragile margin.

  • Pass threshold: Net profit per trade exceeds 2x the estimated spread cost.
  • ---

    Risk Metrics

    #### 9. Maximum Drawdown Is Acceptable

    Maximum drawdown is the peak-to-trough decline of your equity curve. This is arguably the most important risk metric because it determines whether you'll psychologically survive the strategy.

  • Guidelines:
  • - Conservative traders: Max drawdown < 15% - Moderate traders: Max drawdown < 25% - Aggressive traders: Max drawdown < 40% - If max drawdown exceeds 50%, the strategy is likely unsuitable for most traders

  • Pass threshold: Max drawdown is within your personal risk tolerance AND you could honestly endure it emotionally.
  • #### 10. Sharpe Ratio Above 1.0

    The Sharpe ratio measures risk-adjusted return. A ratio above 1.0 means you're getting at least 1 unit of return for every unit of risk. Below 1.0, you're taking on too much risk relative to the reward.

  • Benchmarks:
  • - < 0.5: Poor - 0.5–1.0: Mediocre - 1.0–1.5: Good - 1.5–2.0: Excellent - > 2.0: Exceptional (or possibly overfitted)

  • Pass threshold: Sharpe ratio ≥ 1.0.
  • #### 11. Profit Factor Above 1.5

    Profit factor = Gross profits / Gross losses. A profit factor of 1.5 means your winners are 50% larger than your losers in aggregate.

  • Benchmarks:
  • - < 1.0: Losing strategy - 1.0–1.2: Barely profitable (fragile) - 1.2–1.5: Moderately profitable - 1.5–2.0: Solidly profitable - > 2.0: Very profitable (or possibly overfitted)

  • Pass threshold: Profit factor ≥ 1.5.
  • #### 12. Win Rate Matches Strategy Type

    Different strategies have different expected win rates. Verify yours is in the right range:

    | Strategy Type | Expected Win Rate | Premium selling (puts, calls)65–80% Iron condors60–75% Vertical spreads55–70% Long options (buying calls/puts)30–45% | Straddles/strangles (long) | 25–40% |

    If your premium selling strategy shows a 95% win rate, you've probably set stops too wide or haven't accounted for the occasional catastrophic loss.

  • Pass threshold: Win rate is within the expected range for the strategy type.
  • ---

    Stress Testing & Robustness

    #### 13. Performance During High VIX Periods

    Isolate your backtest results during periods when VIX exceeded 25. Many premium selling strategies look fantastic overall but hemorrhage money during volatility spikes.

  • How to check: Use OptionsPilot's VIX filter to segment performance by volatility regime.
  • Pass threshold: Strategy is either profitable during high VIX OR has explicit rules to reduce/eliminate exposure during high VIX.
  • #### 14. No Single Trade Accounts for >20% of Total Profit

    If one trade generated 20%+ of your total backtest profit, the overall results are unreliable. That single trade might have been luck, an anomaly, or a data error.

  • How to check: Review the trade log and sort by P&L. Calculate the top trade's contribution to total profit.
  • Pass threshold: No single trade accounts for more than 10% of total profit.
  • #### 15. Consistent Monthly Returns (No Feast-or-Famine Pattern)

    Review the monthly returns heatmap. A good strategy shows relatively consistent returns month over month, not a pattern of 10 good months followed by 2 catastrophic months that wipe out the gains.

  • How to check: Check the monthly returns heatmap in the backtest results. Look for consistency.
  • Pass threshold: Fewer than 15% of months show losses exceeding 2x the average monthly gain.
  • #### 16. Test With Position Sizing Limits

    Run the backtest with realistic position sizing (e.g., max 5% of capital per trade for spreads, max 20% for covered calls). Unlimited position sizing hides the impact of consecutive losses.

  • Pass threshold: Results remain attractive with realistic position sizing.
  • ---

    Optimization Traps

    #### 17. Out-of-Sample Validation

    Split your data into two periods: an "in-sample" period for optimization (e.g., 2000–2015) and an "out-of-sample" period for validation (e.g., 2016–2025). Optimize parameters on the first period, then test on the second WITHOUT changing anything.

    If out-of-sample results are dramatically worse than in-sample, you've over-fitted.

  • Pass threshold: Out-of-sample Sharpe ratio is within 30% of in-sample Sharpe ratio.
  • #### 18. Parameters Are Not Hyper-Specific

    If your optimal settings are "sell a 0.27-delta call at exactly 37 DTE, roll at 47.5% profit, but only on Tuesdays when VIX is between 16.2 and 18.7"—you've curve-fitted. Optimal parameters should be round, simple numbers that make intuitive sense.

  • Pass threshold: Parameters are round numbers (0.30 delta, 45 DTE, 50% profit target) and small changes don't dramatically alter results.
  • #### 19. Strategy Works on Multiple Underlyings

    If possible, test the same strategy on SPY and QQQ (or SPX and NDX). A genuinely robust strategy should work on related underlyings, not just the one you optimized on.

  • Pass threshold: Strategy is profitable on at least 2 related underlyings.
  • #### 20. Results Survive Pessimistic Assumptions

    Re-run the backtest with:

  • 2x normal slippage
  • 1 additional day of entry/exit delay
  • 10% worse fills on entries and exits
  • If the strategy is still profitable under these conservative assumptions, it has a genuine edge. If it flips to a loss, the edge is too thin.

  • Pass threshold: Strategy remains profitable (Sharpe > 0.7) under pessimistic assumptions.
  • ---

    Scoring Your Strategy

    Award 1 point for each check passed. Here's how to interpret your score:

    | Score | Assessment | Action | 18–20ExcellentProceed to paper trading with confidence 15–17GoodAddress the gaps before paper trading 12–14FairSignificant issues need resolution 9–11PoorMajor rework needed | Below 9 | Fail | Strategy is not ready for any form of live testing |

    ---

    FAQ

    How long should a backtest take?

    With OptionsPilot, a single backtest across 30 years runs in seconds. Running through this full checklist—including multiple parameter variations, regime analysis, and out-of-sample testing—typically takes 1–3 hours.

    Do professional traders use checklists like this?

    Yes. Institutional quant teams have far more elaborate validation processes, but the core principles are the same: sufficient data, stress testing, robustness checks, and out-of-sample validation. This checklist adapts those principles for individual traders.

    What if my strategy fails several checklist items?

    That's actually a good outcome—you've identified problems before risking real money. Go back, adjust the strategy, and re-test. Iteration is the entire point of backtesting.

    Should I use this checklist for every strategy?

    Yes. Even strategies you're confident about. Overconfidence is one of the biggest risks in trading, and a systematic checklist counteracts it.

    ---

    Validate Your Strategy Today

    Use this checklist with OptionsPilot's backtester to systematically validate any options strategy before committing capital.

    Start Backtesting →