Executive Summary
We present an analytical time series forecasting method that decomposes a signal into periodic components (noise-aware Fourier extraction), long-period trends (BIC-selected regression), discrete shocks (AIC-selected shape fitting), and local residual structure (recency-weighted autoregression) — then adaptively applies wavelet decomposition when the Fourier residual exhibits non-white autocorrelation. The stopping criterion is grounded in maximum entropy: decomposition continues until the residual is incompressible.
On datasets with strong periodic structure (ETTh1, ETTm1), the method achieves normalized MSE of 0.131 and 0.088 at horizon 96 — improvements of 65% and 73% over Google's TimesFM (200M parameters) and PatchTST (supervised transformer). On datasets with weaker periodic structure (ETTh2, ETTm2), the transformer baselines win. The residual lag-1 autocorrelation after decomposition predicts this split perfectly: ETTh1 residuals are white noise (AC(1) = 0.037), ETTm2 residuals retain predictable structure (AC(1) = 0.203).
The method requires no training data, no GPU, runs all four benchmarks in 89 seconds on a laptop, and every component is inspectable. The implementation is 1,200 lines of Python using NumPy, SciPy, and PyWavelets.
Key Contributions
- Noise-aware Fourier extraction: Models the expected spectral footprint of each signal component at the observed noise level, extracting the entire footprint as a unit rather than fragmenting noisy peaks. Extreme value stopping criterion prevents over-extraction from noise.
- Layered decomposition: Six processing stages in three conceptual layers (periodic, aperiodic, residual). Each stage handles a distinct timescale: Fourier for cycles, BIC regression for trends, AIC shape fitting for shocks, AR for local momentum, wavelets for non-periodic residual structure.
- Adaptive basis selection: When the post-Fourier residual has AC(1) ≥ 0.1, wavelet decomposition captures localized time-frequency features the periodic basis missed. The selection is data-driven — no tuning.
- Maximum entropy stopping criterion: Decomposition terminates when the residual reaches maximum entropy for its variance. Practically: Gaussian residuals with zero autocorrelation at all lags. This connects decomposition to information-theoretic compression.
- Residual diagnostic: The residual autocorrelation is a practical basis selection criterion. AC(1) < 0.05: Fourier sufficient. AC(1) ≥ 0.15: need a different basis or a learned model. This tells you before deploying whether you need a 200M parameter transformer or whether NumPy will do.
Key Findings
- ETTh1 H=96: MSE 0.131 vs TimesFM 0.375 (+65%), PatchTST 0.370 (+65%)
- ETTm1 H=96: MSE 0.088 vs TimesFM 0.320 (+73%), PatchTST 0.293 (+70%)
- ETTh1 H=720: MSE 0.320 — still below TimesFM's best score at H=96
- ETTh2 H=96: MSE 0.400 vs TimesFM 0.289 — transformer wins on non-periodic data
- ETTm2 H=96: MSE 0.251 vs TimesFM 0.175 — transformer wins
- Variance explained ≠ forecast accuracy: 99.8% variance explained on ETTm2 but worst forecast. The 0.2% missed has AC(1) = 0.203 — it's predictable signal the Fourier basis can't capture
- 89 seconds total: All 4 datasets × 4 horizons on a laptop, no GPU
Key References
A Decoder-Only Foundation Model for Time-Series Forecasting. Proc. ICML 2024.
A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. Proc. ICLR 2023.
Aperture Synthesis with a Non-Regular Distribution of Interferometer Baselines. Astron. Astrophys. Suppl. 15, 417-426.
Probability Theory: The Logic of Science. Cambridge University Press.
Coding Theorems for a Discrete Source with a Fidelity Criterion. IRE Nat. Conv. Rec. 7, 142-163.