Entropy-Bounded Decomposition for Time Series Forecasting

Executive Summary

We present an analytical time series forecasting method that decomposes a signal into periodic components (noise-aware Fourier extraction), long-period trends (BIC-selected regression), discrete shocks (AIC-selected shape fitting), and local residual structure (recency-weighted autoregression) — then adaptively applies wavelet decomposition when the Fourier residual exhibits non-white autocorrelation. The stopping criterion is grounded in maximum entropy: decomposition continues until the residual is incompressible.

On datasets with strong periodic structure (ETTh1, ETTm1), the method achieves normalized MSE of 0.131 and 0.088 at horizon 96 — improvements of 65% and 73% over Google's TimesFM (200M parameters) and PatchTST (supervised transformer). On datasets with weaker periodic structure (ETTh2, ETTm2), the transformer baselines win. The residual lag-1 autocorrelation after decomposition predicts this split perfectly: ETTh1 residuals are white noise (AC(1) = 0.037), ETTm2 residuals retain predictable structure (AC(1) = 0.203).

The method requires no training data, no GPU, runs all four benchmarks in 89 seconds on a laptop, and every component is inspectable. The implementation is 1,200 lines of Python using NumPy, SciPy, and PyWavelets.

Key Contributions

Noise-aware Fourier extraction: Models the expected spectral footprint of each signal component at the observed noise level, extracting the entire footprint as a unit rather than fragmenting noisy peaks. Extreme value stopping criterion prevents over-extraction from noise.
Layered decomposition: Six processing stages in three conceptual layers (periodic, aperiodic, residual). Each stage handles a distinct timescale: Fourier for cycles, BIC regression for trends, AIC shape fitting for shocks, AR for local momentum, wavelets for non-periodic residual structure.
Adaptive basis selection: When the post-Fourier residual has AC(1) ≥ 0.1, wavelet decomposition captures localized time-frequency features the periodic basis missed. The selection is data-driven — no tuning.
Maximum entropy stopping criterion: Decomposition terminates when the residual reaches maximum entropy for its variance. Practically: Gaussian residuals with zero autocorrelation at all lags. This connects decomposition to information-theoretic compression.
Residual diagnostic: The residual autocorrelation is a practical basis selection criterion. AC(1) < 0.05: Fourier sufficient. AC(1) ≥ 0.15: need a different basis or a learned model. This tells you before deploying whether you need a 200M parameter transformer or whether NumPy will do.

Key Findings

ETTh1 H=96: MSE 0.131 vs TimesFM 0.375 (+65%), PatchTST 0.370 (+65%)
ETTm1 H=96: MSE 0.088 vs TimesFM 0.320 (+73%), PatchTST 0.293 (+70%)
ETTh1 H=720: MSE 0.320 — still below TimesFM's best score at H=96
ETTh2 H=96: MSE 0.400 vs TimesFM 0.289 — transformer wins on non-periodic data
ETTm2 H=96: MSE 0.251 vs TimesFM 0.175 — transformer wins
Variance explained ≠ forecast accuracy: 99.8% variance explained on ETTm2 but worst forecast. The 0.2% missed has AC(1) = 0.203 — it's predictable signal the Fourier basis can't capture
89 seconds total: All 4 datasets × 4 horizons on a laptop, no GPU

Key References

Das, A., Kong, W., Leber, A. & Sen, R. (2024)

A Decoder-Only Foundation Model for Time-Series Forecasting. Proc. ICML 2024.

Nie, Y., Nguyen, N. H., Sinthong, P. & Kalagnanam, J. (2023)

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. Proc. ICLR 2023.

Högbom, J. A. (1974)

Aperture Synthesis with a Non-Regular Distribution of Interferometer Baselines. Astron. Astrophys. Suppl. 15, 417-426.

Jaynes, E. T. (2003)

Probability Theory: The Logic of Science. Cambridge University Press.

Shannon, C. E. (1959)

Coding Theorems for a Discrete Source with a Fidelity Criterion. IRE Nat. Conv. Rec. 7, 142-163.

Executive Summary

Key Contributions

Key Findings

Key References

Download Full Paper