TimesFM: Time Series Foundation Model¶
A Decoder-Only Foundation Model for Time-Series Forecasting by Google Research
Overview¶
TimesFM (Time Series Foundation Model) is a decoder-only architecture developed by Google Research for time-series forecasting. It's designed for efficient zero-shot forecasting across diverse domains.
Paper¶
A decoder-only foundation model for time-series forecasting
Key Features¶
- ✅ Decoder-only transformer architecture
- ✅ Efficient zero-shot forecasting
- ✅ Patch-based input processing
- ✅ Multiple quantile predictions
- ✅ Fast inference on GPU
Quick Start¶
from samay.model import TimesfmModel
from samay.dataset import TimesfmDataset
# Model configuration
repo = "google/timesfm-1.0-200m-pytorch"
config = {
"context_len": 512,
"horizon_len": 192,
"backend": "gpu",
"per_core_batch_size": 32,
"input_patch_len": 32,
"output_patch_len": 128,
"num_layers": 20,
"model_dims": 1280,
"quantiles": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
}
# Load model
tfm = TimesfmModel(config=config, repo=repo)
# Load dataset
train_dataset = TimesfmDataset(
name="ett",
datetime_col='date',
path='data/ETTh1.csv',
mode='train',
context_len=config["context_len"],
horizon_len=config["horizon_len"]
)
# Evaluate (zero-shot)
avg_loss, trues, preds, histories = tfm.evaluate(train_dataset)
print(f"Average Loss: {avg_loss}")
Model Variants¶
TimesFM comes in multiple sizes:
| Model | Parameters | Repository |
|---|---|---|
| TimesFM 1.0 (200M) | 200M | google/timesfm-1.0-200m-pytorch |
| TimesFM 2.0 (500M) | 500M | google/timesfm-2.0-500m-pytorch |
Choosing a Model¶
# Smaller, faster model
repo = "google/timesfm-1.0-200m-pytorch"
# Larger, more accurate model
repo = "google/timesfm-2.0-500m-pytorch"
Configuration Parameters¶
Model Configuration¶
| Parameter | Type | Default | Description |
|---|---|---|---|
context_len |
int | 512 |
Length of historical context |
horizon_len |
int | 192 |
Forecast horizon |
backend |
str | "gpu" |
Backend: "gpu" or "cpu" |
per_core_batch_size |
int | 32 |
Batch size per core |
input_patch_len |
int | 32 |
Length of input patches |
output_patch_len |
int | 128 |
Length of output patches |
num_layers |
int | 20 |
Number of transformer layers |
model_dims |
int | 1280 |
Model dimension |
quantiles |
list | [0.1, ..., 0.9] |
Quantiles for prediction intervals |
Example Configurations¶
Standard Configuration (200M Model)¶
config = {
"context_len": 512,
"horizon_len": 192,
"backend": "gpu",
"per_core_batch_size": 32,
"input_patch_len": 32,
"output_patch_len": 128,
"num_layers": 20,
"model_dims": 1280,
"quantiles": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
}
Larger Model (500M)¶
config = {
"context_len": 512,
"horizon_len": 192,
"backend": "gpu",
"per_core_batch_size": 32,
"input_patch_len": 32,
"output_patch_len": 128,
"num_layers": 50, # More layers
"model_dims": 1280,
"quantiles": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
}
CPU Inference¶
config = {
"context_len": 512,
"horizon_len": 96,
"backend": "cpu", # Use CPU
"per_core_batch_size": 8, # Smaller batch
# ... other configs
}
Dataset¶
TimesfmDataset Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
name |
str | None |
Dataset name |
datetime_col |
str | "ds" |
Name of datetime column |
path |
str | Required | Path to CSV file |
mode |
str | "train" |
"train" or "test" |
context_len |
int | 128 |
Length of input context |
horizon_len |
int | 32 |
Forecast horizon |
freq |
str | "h" |
Frequency: "h", "d", "w", etc. |
normalize |
bool | False |
Whether to normalize data |
stride |
int | 10 |
Stride for sliding window |
batchsize |
int | 4 |
Batch size |
Data Format¶
CSV file with datetime and value columns:
date,HUFL,HULL,MUFL,MULL,LUFL,LULL,OT
2016-07-01 00:00:00,5.827,2.009,1.599,0.462,5.677,2.009,6.082
2016-07-01 01:00:00,5.693,2.076,1.492,0.426,5.485,1.942,5.947
...
Zero-Shot Forecasting¶
TimesFM excels at zero-shot forecasting:
from samay.model import TimesfmModel
from samay.dataset import TimesfmDataset
# Load model
repo = "google/timesfm-1.0-200m-pytorch"
config = {
"context_len": 512,
"horizon_len": 192,
"backend": "gpu",
"per_core_batch_size": 32,
"input_patch_len": 32,
"output_patch_len": 128,
"num_layers": 20,
"model_dims": 1280,
"quantiles": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
}
tfm = TimesfmModel(config=config, repo=repo)
# Load test data
test_dataset = TimesfmDataset(
name="ett",
datetime_col='date',
path='data/ETTh1.csv',
mode='test',
context_len=config["context_len"],
horizon_len=config["horizon_len"]
)
# Zero-shot evaluation (no training!)
avg_loss, trues, preds, histories = tfm.evaluate(test_dataset)
print(f"Zero-shot Loss: {avg_loss}")
Evaluation¶
Basic Evaluation¶
test_dataset = TimesfmDataset(
name="ett",
datetime_col='date',
path='data/ETTh1.csv',
mode='test',
context_len=512,
horizon_len=192
)
avg_loss, trues, preds, histories = tfm.evaluate(test_dataset)
With Custom Metrics¶
from samay.metric import mse, mae, mape
import numpy as np
avg_loss, trues, preds, histories = tfm.evaluate(test_dataset)
trues = np.array(trues)
preds = np.array(preds)
print(f"MSE: {mse(trues, preds):.4f}")
print(f"MAE: {mae(trues, preds):.4f}")
print(f"MAPE: {mape(trues, preds):.4f}")
Quantile Predictions¶
TimesFM provides prediction intervals via quantiles:
config = {
# ... other configs
"quantiles": [0.1, 0.25, 0.5, 0.75, 0.9], # 10%, 25%, median, 75%, 90%
}
tfm = TimesfmModel(config=config, repo=repo)
# The model will output predictions for each quantile
avg_loss, trues, preds, histories = tfm.evaluate(test_dataset)
# preds shape: (num_samples, num_channels, horizon_len, num_quantiles)
Visualizing Prediction Intervals¶
import matplotlib.pyplot as plt
import numpy as np
# Assuming preds has shape (num_samples, num_channels, horizon_len, num_quantiles)
median_idx = 2 # Index of 0.5 quantile
lower_idx = 0 # Index of 0.1 quantile
upper_idx = 4 # Index of 0.9 quantile
sample_idx = 0
channel_idx = 0
history = histories[sample_idx, channel_idx, :]
true = trues[sample_idx, channel_idx, :]
# Assuming the model returns median predictions
pred_median = preds[sample_idx, channel_idx, :]
plt.figure(figsize=(14, 5))
plt.plot(range(len(history)), history, label="History", linewidth=2)
plt.plot(
range(len(history), len(history) + len(true)),
true,
label="Ground Truth",
linestyle="--",
linewidth=2
)
plt.plot(
range(len(history), len(history) + len(pred_median)),
pred_median,
label="Prediction (Median)",
linewidth=2
)
plt.legend()
plt.title("TimesFM Forecasting with Prediction Intervals")
plt.grid(alpha=0.3)
plt.show()
Handling Different Frequencies¶
TimesFM supports various time frequencies:
# Hourly data
dataset = TimesfmDataset(
datetime_col='date',
path='data/hourly.csv',
freq='h',
# ...
)
# Daily data
dataset = TimesfmDataset(
datetime_col='date',
path='data/daily.csv',
freq='d',
# ...
)
# Weekly data
dataset = TimesfmDataset(
datetime_col='date',
path='data/weekly.csv',
freq='w',
# ...
)
# Monthly data
dataset = TimesfmDataset(
datetime_col='date',
path='data/monthly.csv',
freq='m',
# ...
)
Normalization¶
TimesFM can optionally normalize data:
# With normalization
train_dataset = TimesfmDataset(
name="ett",
datetime_col='date',
path='data/ETTh1.csv',
mode='train',
context_len=512,
horizon_len=192,
normalize=True, # Enable normalization
)
# Denormalize predictions
avg_loss, trues, preds, histories = tfm.evaluate(train_dataset)
denormalized_preds = train_dataset._denormalize_data(preds)
Advanced Usage¶
Custom Context Lengths¶
# Short context for fast inference
config = {
"context_len": 128,
"horizon_len": 64,
# ...
}
# Long context for better accuracy
config = {
"context_len": 1024,
"horizon_len": 256,
# ...
}
Batch Processing¶
# Larger batches for throughput
config = {
"per_core_batch_size": 64,
# ...
}
# Smaller batches for memory efficiency
config = {
"per_core_batch_size": 8,
# ...
}
Visualization¶
import matplotlib.pyplot as plt
import numpy as np
avg_loss, trues, preds, histories = tfm.evaluate(test_dataset)
trues = np.array(trues)
preds = np.array(preds)
histories = np.array(histories)
# Plot multiple channels
fig, axes = plt.subplots(2, 2, figsize=(16, 10))
axes = axes.flatten()
for i in range(4):
ax = axes[i]
history = histories[0, i, :]
true = trues[0, i, :]
pred = preds[0, i, :]
ax.plot(range(len(history)), history, label="History", alpha=0.7)
ax.plot(
range(len(history), len(history) + len(true)),
true,
label="Ground Truth",
linestyle="--"
)
ax.plot(
range(len(history), len(history) + len(pred)),
pred,
label="Prediction"
)
ax.set_title(f"Channel {i}")
ax.legend()
ax.grid(alpha=0.3)
plt.tight_layout()
plt.show()
Tips and Best Practices¶
1. Model Selection¶
- Use 200M model for faster inference
- Use 500M model for higher accuracy
2. Context Length¶
- Longer context (512-1024) for complex patterns
- Shorter context (128-256) for simpler patterns and speed
3. Zero-Shot vs Fine-Tuning¶
- TimesFM is designed for zero-shot forecasting
- Fine-tuning is not typically required
4. GPU Memory¶
- Reduce
per_core_batch_sizeif OOM - Use CPU backend for very limited memory
Common Issues¶
CUDA Out of Memory¶
# Reduce batch size
config = {
"per_core_batch_size": 8, # Lower value
# ...
}
# Or use CPU
config = {
"backend": "cpu",
# ...
}
Slow Inference¶
# Use smaller model
repo = "google/timesfm-1.0-200m-pytorch"
# Reduce context length
config = {
"context_len": 256, # Instead of 512
# ...
}
API Reference¶
For detailed API documentation, see:
Examples¶
See the Examples page for complete working examples.