Backtest strategy — how to do it correctly?
When a reader tells me his strategy gave ninety-five percent winners, I already know the story does not end well. A backtest is neither an oracle nor a promise of future money. It is a tool for falsifying your idea — for finding out whether your rules coped with various regimes or only described one happy episode. What follows: how to run the test honestly, where the worst traps lie, and why even a well-run backtest does not guarantee a profitable live account.
Why test a strategy on historical data at all?
Before risking real money, you check whether your rules would have made money over the past few years. If they produced positive expectancy across five years of mixed conditions, they probably carry some edge. Fragile — markets evolve, central banks shift policy, liquidity migrates — but it is the best we have. A backtest filters out unworkable ideas and lets through those worth demo testing and, later, a small live position. Nothing more.
Rules must be unambiguous, or you are testing fiction
The first step trips up most newcomers. The strategy must be written so a stranger reading the rules would place identical trades. An entry like "I buy when I see a trend" is not a strategy; it is a feeling. A testable version: "buy on the daily close when the fifty-period EMA is above the two-hundred-period EMA, the fourteen-period RSI is below seventy, and price touches the twenty-period mean from above; stop at 1.5 ATR(14) below entry; target 2.5 ATR above; risk one percent of equity per trade." Only such rules can be tested honestly.
A related topic is how to discover a trading edge: if you do not know your edge, the backtest will tell you brutally and quickly. For broader context on testing within the working trader's routine, see the trader's workshop on ForexMechanics.com.
Where to get historical data and how many samples are enough?
Data is the second great source of error. Tick history from one broker will never match another tick for tick — different execution model, different liquidity source. For a tight-stop strategy that gap dictates the outcome. For intraday work reach for CME futures data — centralised and auditable. For swing on daily candles, broker history is usually adequate. The choice of tool also matters: a comparison of Forex Tester vs the built-in MT Strategy Tester explains when the dedicated application gives an advantage over the platform's own simulator.
The second issue is sample size. A long-standing rule says at least one hundred trades, otherwise the result is easily a gift of luck. Thirty wonderful trades over six months tell you nothing. One hundred is the threshold of statistical significance; professionals aim for three hundred or more. Swing on D1 needs five years, day trading two, scalping one year of tick history.
Five years has another virtue: it covers several regimes. The last decade gave us trends (DXY 2014–2017), a volatility shock (March 2020), a tightening cycle (2022–2023) and consolidation through 2024. A strategy working only in one regime is not a strategy — it is an illusion fitted to one era.
Spread, commission and slippage — without them the backtest lies
The most common "miracle" backtest forgot to subtract costs. For swing on H4 with a 200-pip target, a spread of 0.8 pips is barely visible. For a scalper running 30 trades a day with a five-pip target, the same spread devours most of the edge. A realistic backtest must include the broker's spread, the per-lot commission and, for execution-sensitive strategies, slippage — the gap between the price you see when you click and the price you actually get.
My threshold: if average profit per trade is less than twice the average cost (spread, commission and assumed slippage combined), the strategy has no safety margin. Harsh, but it saves months of self-deception. A separate trap to watch for: indicators that repaint their historical values will look perfect in any backtest but behave completely differently in real-time trading.
A curve fitted to history is not a strategy — it is a museum
The trap most self-taught traders fall into is over-fitting. You test thirty values of a parameter, pick the best result and announce the system returns forty percent a year. What you actually did was optimise to noise, not signal. The more parameters the optimiser touches, the higher the chance the result is coincidence. Robert Pardo, author of the classic on system evaluation, puts it bluntly:
"The out-of-sample test is the only honest measure of strategy quality. If a system fails to retain its edge on data it did not see during optimisation, the system has been fitted to history, not to the market." — Robert Pardo, The Evaluation and Optimization of Trading Strategies, Wiley, 2008.
Hence the split into in-sample tuning and out-of-sample confirmation, typically seventy / thirty. If out-of-sample is materially worse, you have over-fitting — the strategy is not fit for a live account. A more rigorous approach is walk-forward analysis, which alternates an optimisation window with a verification window, rolling through history. The best protection we have against false confidence in a backtest.
A hypothetical example — how to read the numbers honestly
Imagine a swing strategy on EUR/USD D1 tested over 2019–2024 with the workflow from MT4 and MT5 backtesting in practice. Illustrative result: 147 trades, 54 percent winners, average reward-to-risk 2.3 to 1, profit factor 1.78, drawdown 14.5 percent, net return plus 87 percent over five years (about 13.3 percent compounded). Unspectacular but realistic — a template, not a real account. A useful next step is a Monte Carlo simulation, which reshuffles the trade order to show how equity might have evolved under different sequences.
What to do tomorrow
Knowledge about backtesting starts working only when you run one and compare the output to your journal. Five steps below take a few afternoons and spare the most common errors of a developing trader.
- Write the strategy in a single text file, mechanically. Every entry, exit, stop loss and filter must be written so another person reading the rules would place identical trades. If you have to add "depends on the situation" anywhere, tighten the rule — a backtest does not read between the lines.
- Collect five years of data for swing or two for day trading, split up front. Reserve the first seventy percent for tuning, lock the last thirty until the strategy is frozen. Only then run the test on the reserved portion — your real examination.
- Inject realistic costs into every simulation. Add the broker's spread, the per-lot commission and an assumed slippage, with separate values for quiet windows and sessions around major releases. If the strategy loses more than twenty percent of its profit once costs are included, the answer is clear: no safety margin.
- Set a hard floor of one hundred trades. If the window produces fewer, extend the history, add instruments from the same family, or accept the result is hypothesis, not evidence. Note trade count, average drawdown and win rate — those three numbers tell more than the headline return.
- After a successful backtest, run demo for at least three months before going live. Compare demo with the backtest — if demo is markedly worse, suspect over-fitting, under-estimated costs, or a coding error. Return to the rules, not the hope that things improve on a real account. A good backtest never guarantees a good live result — it merely earns the right to try.
Sources & bibliography
-
MetaQuotes Strategy Testing in MetaTrader 5 · oficjalna dokumentacja testera strategii (testowanie i optymalizacja na danych historycznych) www.metatrader5.com ↗
-
MQL5 Reference Testing Trading Strategies · dokumentacja deweloperska MQL5: tryby generowania tików, symulacja spreadu, testy wielowalutowe www.mql5.com ↗
-
Bank for International Settlements OTC foreign exchange turnover in April 2022 · Triennial Central Bank Survey — dane o strukturze rynku FX (kontekst dla backtestu instrumentów detalicznych) www.bis.org ↗
Frequently asked
What is strategy over-fitting?
Over-fitting (also called curve-fitting) is the situation in which a strategy's parameters have been tuned so tightly to past quotes that they fail to handle new data. The classic symptom is a backtest with a ninety-five percent win rate and a live account with thirty. The reason is simple: a ninety-five percent win rate is not sustainable in forex over the long run — real strategies stay in the fifty to sixty percent range. If your backtest shows more than seventy percent winners, treat it as a warning and suspect over-fitting until proven otherwise.
Which software should I use for backtesting?
For beginners the strongest choice is the built-in Strategy Tester in MetaTrader 5: it is free, supports multi-currency tests, real tick data and genetic optimisation. MetaTrader 4 is still used but is limited to one instrument and one timeframe. Forex Tester 5 costs around three hundred dollars and offers candle-by-candle manual testing — a good tool for traders who want visual intuition for the rules before automating them. Pine Script in TradingView is adequate for simple single-instrument tests. In practice, most serious testing belongs in MT5 or in a Python scripting environment with dedicated backtest libraries.
How much historical data do I need for an honest test?
For swing and position strategies the rule of thumb is at least five years of data, for day trading two years, and for scalping one year of genuine tick history. These windows are not chosen by calendar magic — they exist so that the test covers different regimes: trending, ranging and high-volatility. Independently of the timeframe there is a statistical condition: at least one hundred trades in the test, so that the result is not a fluke. Professionals aim for three hundred or more. If your history produces fewer trades, extend the window or add instruments from the same family — otherwise you are testing a hypothesis rather than a strategy.
What do realistic results from a good backtest look like?
Realistic numbers are a fifty to sixty percent win rate, an average reward-to-risk ratio of at least two to one, a profit factor in the 1.5 to 3.0 range and a maximum drawdown under twenty percent — with at least one hundred trades and the figure confirmed on an out-of-sample window. A Sharpe ratio above one indicates a reasonable return relative to volatility. Red flags are a win rate above eighty percent, a profit factor above five and a drawdown under five percent — that combination almost always signals over-fitting, not edge.