Series: Chasms of Evolutionary Impossibilities – Douglas Axe’s Work (2004) and the Evolutionary Impossibility of a Mere Protein.
doi:10.1016/j.jmb.2004.06.058
9.3 “Axe Linearly Extrapolated Sequence Space”
When confusing advanced statistics with simplistic guessing — and ignoring real mathematics
Objection
Some critics claim that Douglas Axe committed a basic statistical error by extrapolating data from few mutations to the entire space of possible sequences. According to this criticism, his estimate that only 1 in 10⁷⁷ amino acid sequences forms a functional protein would be invalid, as it would have been obtained by simplistic linear extrapolation — like trying to predict a year's weather based on just one week of observations.
🪜 For the lay reader: It is like saying Axe used a ruler to measure a mountain — when, in fact, he used high-precision radar, calibrated by multiple sensors, and validated by other experts.
What Axe Actually Did
Axe did not use simple linear extrapolation. He applied multivariate log-linear regression, a sophisticated statistical technique widely used in bioinformatics, epidemiology, and complex systems modeling.
✅ Accessible methodological summary:
- He tested 6 independent sampling points in mutational space
- Each point had 3 technical replicates — totaling 18 independent experiments
- At each point, he measured the ratio between functional and non-functional sequences
- When data were plotted on a logarithmic scale, the model showed R² = 0.99 — meaning 99% of the variation was mathematically explained
🪜 Explanation for laypeople: It is like taking measurements at various points on a road, with high-precision sensors, and discovering that the slope follows a predictable pattern. Axe did not guess — he measured, modeled, and validated.
Where is the Logical Error?
The criticism commits a category fallacy — confusing linear extrapolation (which would be invalid) with multivariate log-linear regression (which is statistically valid and widely accepted).
🪜 Refined analogy:
Additionally, critics ignore that protein sequence space is not random — it follows predictable mathematical patterns. Mutations at critical functional sites cause abrupt loss of function, while mutations in structural regions have more gradual effects. Axe captured this non-linear dynamics with precision.
What the Data Show
The equation used by Axe was:
Where the coefficients \(\beta\) represent the impact of each mutation type on protein function. This model does not assume linearity — it incorporates complex, non-linear relationships through logarithmic transformation.
✅ External validation: In 2016, Truman reproduced the methodology with more advanced technology (next-generation sequencing) and obtained:
🪜 Explanation for laypeople: Even with more modern equipment and independent methods, the results were practically the same. This shows that Axe's estimate is robust and reliable.
Model
The criticism ignores an established fact in biochemical literature: functional sequences follow log-normal distribution, not random. This means:
- Most sequences are non-functional
- The few functional sequences are clustered in "islands" within a vast ocean of useless possibilities
- This pattern allows valid statistical extrapolation — provided it is done with adequate sampling and appropriate models
🪜 New analogy:
What Does the Scientific Literature Say?
- Goldstein (2009): Admitted that functional sequences exhibit log-normal distribution — validating Axe's approach
- Truman (2016): Reproduced Axe's methodology with concordant results
- Echave (2016): Established that extrapolations are valid when based on adequate sampling
- Storz (2010): Showed that mutations at critical residues have abrupt and predictable effects
- Salverda (2011): Demonstrated that statistical patterns emerge even in complex biological systems
🪜 For the lay reader: These studies show that protein function follows real mathematical patterns — and that models like Axe's are not only valid but necessary to understand these patterns.
Why This Criticism Fails
The criticism fails because it attacks a caricature of Axe's methodology, not what he actually did. He did not extrapolate linearly — he applied a log-linear regression validated by:
- High correlation (R² = 0.99)
- Bootstrap analysis
- Independent replication
- Consistency with biochemical literature
🪜 Refined final analogy:
Conclusion for the Lay Reader
Axe did not make a simplistic extrapolation — he applied advanced statistical models, with real data, independent replication, and mathematical validation.
The criticism reveals more about the critics' statistical ignorance than about any flaw in Axe's work.
🪜 Visual summary:
Therefore, this criticism does not invalidate the study.
Priority Self-Refuting Sources (κ > 0.9)
- Goldstein (2009): Admits functional sequences follow log-normal distribution
- Truman (2016): Reproduced Axe's methodology with concordant results
- Echave (2016): Justifies statistical extrapolations based on predictable patterns
- Storz (2010): Validates non-linear effects of mutations at critical residues
- Salverda (2011): Confirms statistical patterns emerge in complex biological systems