TinyTimeMixer: Fast Pre-trained Models for Time Series¶
Overview¶
TinyTimeMixer (TTM) is a compact and efficient time-series forecasting model designed for fast inference and low memory footprint. It uses a mixer-based architecture that balances performance with computational efficiency, making it ideal for resource-constrained environments.
Paper¶
TinyTimeMixer: Fast Pre-trained Models for Time Series
Key Features¶
- ✅ Lightweight architecture (compact model size)
- ✅ Fast inference speed
- ✅ Low memory footprint
- ✅ Competitive forecasting accuracy
- ✅ Efficient training and fine-tuning
- ✅ Multivariate time-series support
Quick Start¶
from samay.model import TinyTimeMixerModel
from samay.dataset import TinyTimeMixerDataset
# Model configuration
config = {
    "context_len": 512,
    "horizon_len": 96,
    "model_size": "tiny",
}
# Load model
model = TinyTimeMixerModel(config)
# Load dataset
train_dataset = TinyTimeMixerDataset(
    name="ett",
    datetime_col="date",
    path="./data/ETTh1.csv",
    mode="train",
    context_len=config["context_len"],
    horizon_len=config["horizon_len"],
)
# Fine-tune
finetuned_model = model.finetune(train_dataset, epochs=10)
# Evaluate
test_dataset = TinyTimeMixerDataset(
    name="ett",
    datetime_col="date",
    path="./data/ETTh1.csv",
    mode="test",
    context_len=config["context_len"],
    horizon_len=config["horizon_len"],
)
avg_loss, trues, preds, histories = model.evaluate(test_dataset)
print(f"Average Loss: {avg_loss}")
Model Variants¶
TinyTimeMixer comes in different sizes:
| Variant | Parameters | Speed | Accuracy | 
|---|---|---|---|
| Tiny | ~1M | Fastest | Good | 
| Small | ~5M | Fast | Better | 
| Base | ~15M | Moderate | Best | 
Configuration Parameters¶
Model Configuration¶
| Parameter | Type | Default | Description | 
|---|---|---|---|
| context_len | int | 512 | Length of input context | 
| horizon_len | int | 96 | Forecast horizon | 
| model_size | str | "tiny" | Model size: "tiny","small","base" | 
| d_model | int | 64 | Model dimension | 
| n_heads | int | 4 | Number of attention heads | 
| n_layers | int | 4 | Number of mixer layers | 
| dropout | float | 0.1 | Dropout rate | 
Example Configurations¶
Tiny Model (Fast Inference)¶
config = {
    "context_len": 512,
    "horizon_len": 96,
    "model_size": "tiny",
    "d_model": 64,
    "n_layers": 4,
}
Small Model (Balanced)¶
config = {
    "context_len": 512,
    "horizon_len": 96,
    "model_size": "small",
    "d_model": 128,
    "n_layers": 6,
}
Base Model (High Accuracy)¶
config = {
    "context_len": 512,
    "horizon_len": 192,
    "model_size": "base",
    "d_model": 256,
    "n_layers": 8,
}
Dataset¶
TinyTimeMixerDataset Parameters¶
| Parameter | Type | Default | Description | 
|---|---|---|---|
| name | str | None | Dataset name | 
| datetime_col | str | "ds" | Name of datetime column | 
| path | str | Required | Path to CSV file | 
| mode | str | None | "train"or"test" | 
| context_len | int | 512 | Length of input context | 
| horizon_len | int | 64 | Forecast horizon | 
| batch_size | int | 128 | Batch size | 
| boundaries | list | [0, 0, 0] | Custom split boundaries | 
| stride | int | 10 | Stride for sliding window | 
Data Format¶
CSV file with datetime and value columns:
date,HUFL,HULL,MUFL,MULL,LUFL,LULL,OT
2016-07-01 00:00:00,5.827,2.009,1.599,0.462,5.677,2.009,6.082
2016-07-01 01:00:00,5.693,2.076,1.492,0.426,5.485,1.942,5.947
...
Training¶
Basic Training¶
from samay.model import TinyTimeMixerModel
from samay.dataset import TinyTimeMixerDataset
# Configure model
config = {
    "context_len": 512,
    "horizon_len": 96,
    "model_size": "tiny",
}
model = TinyTimeMixerModel(config)
# Load training data
train_dataset = TinyTimeMixerDataset(
    datetime_col="date",
    path="./data/ETTh1.csv",
    mode="train",
    context_len=512,
    horizon_len=96,
    batch_size=128,
)
# Fine-tune
finetuned_model = model.finetune(
    train_dataset,
    epochs=20,
    learning_rate=1e-3,
)
Training with Validation¶
# Training dataset
train_dataset = TinyTimeMixerDataset(
    datetime_col="date",
    path="./data/ETTh1.csv",
    mode="train",
    context_len=512,
    horizon_len=96,
    boundaries=[0, 10000, 15000],  # Custom split
)
# Validation dataset
val_dataset = TinyTimeMixerDataset(
    datetime_col="date",
    path="./data/ETTh1.csv",
    mode="val",
    context_len=512,
    horizon_len=96,
    boundaries=[0, 10000, 15000],
)
# Fine-tune with validation
finetuned_model = model.finetune(
    train_dataset,
    val_dataset=val_dataset,
    epochs=20,
    learning_rate=1e-3,
)
Evaluation¶
Basic Evaluation¶
test_dataset = TinyTimeMixerDataset(
    datetime_col="date",
    path="./data/ETTh1.csv",
    mode="test",
    context_len=512,
    horizon_len=96,
)
avg_loss, trues, preds, histories = model.evaluate(test_dataset)
print(f"Average Test Loss: {avg_loss}")
With Custom Metrics¶
from samay.metric import mse, mae, mape, rmse
import numpy as np
avg_loss, trues, preds, histories = model.evaluate(test_dataset)
trues = np.array(trues)
preds = np.array(preds)
print(f"MSE:  {mse(trues, preds):.4f}")
print(f"MAE:  {mae(trues, preds):.4f}")
print(f"RMSE: {rmse(trues, preds):.4f}")
print(f"MAPE: {mape(trues, preds):.4f}%")
Zero-Shot Forecasting¶
TinyTimeMixer supports zero-shot forecasting:
# Load pre-trained model
config = {
    "context_len": 512,
    "horizon_len": 96,
    "model_size": "tiny",
}
model = TinyTimeMixerModel(config)
# Test on new data without training
test_dataset = TinyTimeMixerDataset(
    datetime_col="date",
    path="./data/new_domain.csv",
    mode="test",
    context_len=512,
    horizon_len=96,
)
# Zero-shot evaluation
avg_loss, trues, preds, histories = model.evaluate(test_dataset)
Multivariate Forecasting¶
TinyTimeMixer handles multivariate data efficiently:
# Your CSV with multiple value columns
dataset = TinyTimeMixerDataset(
    datetime_col="date",
    path="./data/multivariate.csv",  # Multiple columns
    mode="train",
    context_len=512,
    horizon_len=96,
)
# Model forecasts all channels simultaneously
avg_loss, trues, preds, histories = model.evaluate(dataset)
# Results shape: (num_windows, num_channels, horizon_len)
print(f"Predictions shape: {preds.shape}")
Advanced Usage¶
Custom Context Lengths¶
# Short context for simple patterns
config = {
    "context_len": 256,
    "horizon_len": 64,
    "model_size": "tiny",
}
# Long context for complex patterns
config = {
    "context_len": 1024,
    "horizon_len": 192,
    "model_size": "small",
}
Batch Size Tuning¶
# Large batch for faster training (if memory allows)
dataset = TinyTimeMixerDataset(
    # ...
    batch_size=256,
)
# Small batch for memory efficiency
dataset = TinyTimeMixerDataset(
    # ...
    batch_size=32,
)
Stride Configuration¶
# Smaller stride for more training samples
dataset = TinyTimeMixerDataset(
    # ...
    stride=1,  # Overlapping windows
)
# Larger stride for faster iteration
dataset = TinyTimeMixerDataset(
    # ...
    stride=96,  # Non-overlapping windows
)
Visualization¶
Single Channel Forecast¶
import matplotlib.pyplot as plt
import numpy as np
avg_loss, trues, preds, histories = model.evaluate(test_dataset)
trues = np.array(trues)
preds = np.array(preds)
histories = np.array(histories)
# Plot first window, first channel
window_idx = 0
channel_idx = 0
history = histories[window_idx, channel_idx, :]
true = trues[window_idx, channel_idx, :]
pred = preds[window_idx, channel_idx, :]
plt.figure(figsize=(14, 5))
plt.plot(range(len(history)), history, label="History (512 steps)", linewidth=2)
plt.plot(
    range(len(history), len(history) + len(true)),
    true,
    label="Ground Truth (96 steps)",
    linestyle="--",
    linewidth=2
)
plt.plot(
    range(len(history), len(history) + len(pred)),
    pred,
    label="TinyTimeMixer Prediction",
    linewidth=2
)
plt.axvline(x=len(history), color='gray', linestyle=':', alpha=0.5)
plt.legend()
plt.title("TinyTimeMixer Time Series Forecasting")
plt.xlabel("Time Step")
plt.ylabel("Value")
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()
Multiple Channels¶
fig, axes = plt.subplots(2, 2, figsize=(16, 10))
axes = axes.flatten()
for i in range(min(4, trues.shape[1])):
    ax = axes[i]
    history = histories[0, i, :]
    true = trues[0, i, :]
    pred = preds[0, i, :]
    ax.plot(range(len(history)), history, label="History", alpha=0.7)
    ax.plot(
        range(len(history), len(history) + len(true)),
        true,
        label="Ground Truth",
        linestyle="--"
    )
    ax.plot(
        range(len(history), len(history) + len(pred)),
        pred,
        label="Prediction"
    )
    ax.set_title(f"Channel {i}")
    ax.legend()
    ax.grid(alpha=0.3)
plt.suptitle("TinyTimeMixer Multi-Channel Forecasting")
plt.tight_layout()
plt.show()
Error Distribution¶
import matplotlib.pyplot as plt
import numpy as np
avg_loss, trues, preds, histories = model.evaluate(test_dataset)
trues = np.array(trues)
preds = np.array(preds)
# Calculate errors
errors = trues - preds
plt.figure(figsize=(12, 5))
# Error distribution
plt.subplot(1, 2, 1)
plt.hist(errors.flatten(), bins=50, alpha=0.7, edgecolor='black')
plt.xlabel("Prediction Error")
plt.ylabel("Frequency")
plt.title("Error Distribution")
plt.grid(alpha=0.3)
# Error over time
plt.subplot(1, 2, 2)
mean_abs_errors = np.mean(np.abs(errors), axis=(0, 1))
plt.plot(mean_abs_errors)
plt.xlabel("Time Step")
plt.ylabel("Mean Absolute Error")
plt.title("Error Over Forecast Horizon")
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()
Performance Comparison¶
Speed Benchmark¶
import time
models = [
    ("Tiny", {"model_size": "tiny", "d_model": 64}),
    ("Small", {"model_size": "small", "d_model": 128}),
    ("Base", {"model_size": "base", "d_model": 256}),
]
for name, model_config in models:
    config = {
        "context_len": 512,
        "horizon_len": 96,
        **model_config
    }
    model = TinyTimeMixerModel(config)
    # Measure inference time
    start_time = time.time()
    avg_loss, trues, preds, histories = model.evaluate(test_dataset)
    elapsed_time = time.time() - start_time
    print(f"{name} Model:")
    print(f"  Loss: {avg_loss:.4f}")
    print(f"  Time: {elapsed_time:.2f}s")
    print()
Tips and Best Practices¶
1. Model Selection¶
- Use Tiny for edge devices and real-time applications
- Use Small for balanced performance and speed
- Use Base when accuracy is more important than speed
2. Context Length¶
- Longer context captures more patterns but is slower
- Match context to your data's seasonal patterns
- Start with 512 and adjust based on results
3. Batch Size¶
- TinyTimeMixer supports large batch sizes (128-256)
- Larger batches = faster training
- Reduce batch size if OOM errors occur
4. Training Duration¶
- TinyTimeMixer trains quickly (10-20 epochs often sufficient)
- Monitor validation loss to avoid overfitting
- Early stopping is recommended
Common Issues¶
CUDA Out of Memory¶
# Use smaller model
config = {
    "model_size": "tiny",
    "d_model": 64,
    # ...
}
# Reduce batch size
dataset = TinyTimeMixerDataset(
    batch_size=32,  # Instead of 128
    # ...
)
# Reduce context/horizon
config = {
    "context_len": 256,  # Instead of 512
    "horizon_len": 48,   # Instead of 96
}
Slow Training¶
# Increase batch size (if memory allows)
dataset = TinyTimeMixerDataset(
    batch_size=256,  # Larger batch
    # ...
)
# Reduce model size
config = {
    "model_size": "tiny",  # Smaller model
    # ...
}
Poor Accuracy¶
# Use larger model
config = {
    "model_size": "base",  # Larger model
    "d_model": 256,
    "n_layers": 8,
}
# Increase context length
config = {
    "context_len": 1024,  # More context
    # ...
}
# Train longer
model.finetune(train_dataset, epochs=50)  # More epochs
Efficient Deployment¶
CPU Inference¶
TinyTimeMixer is efficient on CPU:
import torch
# Force CPU usage
device = torch.device("cpu")
model = TinyTimeMixerModel(config).to(device)
# Inference is still fast!
avg_loss, trues, preds, histories = model.evaluate(test_dataset)
Model Export¶
Export for production deployment:
# Save model
model.save("tinytimemixer_model.pt")
# Load model
loaded_model = TinyTimeMixerModel.load("tinytimemixer_model.pt")
Quantization (for even faster inference)¶
import torch
# Quantize model for faster inference
quantized_model = torch.quantization.quantize_dynamic(
    model,
    {torch.nn.Linear},
    dtype=torch.qint8
)
# Use quantized model
avg_loss, trues, preds, histories = quantized_model.evaluate(test_dataset)
API Reference¶
For detailed API documentation, see:
Examples¶
See the Examples page for complete working examples.
Comparison with Other Models¶
| Feature | TinyTimeMixer | LPTM | TimesFM | MOMENT | 
|---|---|---|---|---|
| Speed | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ | 
| Memory | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐ | 
| Accuracy | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 
| Edge Deployment | ⭐⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐ | ⭐⭐⭐ | 
Use TinyTimeMixer when: - You need fast inference - Memory is limited - Deploying on edge devices - Real-time forecasting is required