MLX Training
Local LoRA fine-tuning on Apple Silicon — how adapters work, Llama + Ollama example
the-brain's Deep Layer uses Apple MLX for zero-cost, fully private LoRA training.
⚠️ LoRA adapters are permanently bound to their base model. An adapter trained on
mlx-community/Llama-3.2-1B-Instruct-4bitonly works with that exact model — same architecture, same tokenizer. It's not a standalone model, it's a set of weights that bias one specific model toward your style. If you switch base models, you need to retrain. Think of it like a tailored suit — it only fits one person.
Concrete Example: Llama 3.2 + Ollama
Here's a full workflow using Meta's Llama 3.2 1B as the base model:
1. Configure the base model
{
"mlx": {
"enabled": true,
"modelPath": "mlx-community/Llama-3.2-1B-Instruct-4bit",
"loraOutputDir": "~/.the-brain/lora-checkpoints",
"schedule": "0 2 * * *"
}
}2. What happens
- During the day: You work with Cursor, Claude Code, or any AI tool. the-brain harvests your interactions — corrections ("no, use
useCallback"), preferences ("arrow functions"), patterns ("always Jest + Testing Library"). - At 2 AM: SPM has promoted the surprising, high-value memories to the DEEP layer. MLX training runs — ~200 iterations on your consolidated memories — producing
adapter.safetensors. - Result: A 2-5 MB adapter file that, when loaded with Llama 3.2, makes the model consistently follow your patterns. It will default to arrow functions, suggest Jest over Vitest, and remember your preferred architecture style.
3. Loading the adapter in Ollama
# Pull the base model
ollama pull llama3.2:1b
# Create a custom model with the adapter
ollama create the-brain-model --from llama3.2:1b --adapter ~/.the-brain/lora-checkpoints/adapter.safetensorsThen point the-brain's LLM config at it:
{
"llm": {
"default": "ollama-adapted",
"backends": {
"ollama-adapted": {
"provider": "ollama",
"baseUrl": "http://localhost:11434/v1",
"defaultModel": "the-brain-model"
}
}
}
}The model now behaves as if it's been working with you for months. Same Llama 3.2 — just biased toward your style.
4. Loading in LM Studio
- Download
Llama-3.2-1B-Instructfrom LM Studio - In model settings → "LoRA Adapter" → browse to
~/.the-brain/lora-checkpoints/adapter.safetensors - All subsequent generations use the adapted model
What changes in practice
Before adapter:
"Write a React login form" → Generates a generic form with
function LoginForm(), usesuseState, doesn't handle errors explicitly
After adapter (3 weeks of corrections):
"Write a React login form" → Uses arrow functions,
useCallbackfor handlers, React Hook Form + Zod, explicittry-catchon submit, named exports
The adapter learned that you write React a specific way — without you ever configuring it manually.
Prerequisites
- macOS with Apple Silicon (M1/M2/M3/M4)
- Python 3.11+ and
uv uv run --with mlx-lm python3 -c "import mlx.core; print('MLX ready')"
Configuration
{
"mlx": {
"enabled": true,
"modelPath": "mlx-community/Llama-3.2-1B-Instruct-4bit",
"loraOutputDir": "~/.the-brain/lora-checkpoints",
"schedule": "0 2 * * *"
}
}Training Flow
- Day: Harvest interactions → SPM evaluates → promote to DEEP
- Night (2 AM): Load DEEP memories → filter noise → MLX LoRA training
Manual Training
the-brain train # Train on DEEP memories
the-brain train --dry-run # Preview
the-brain train --iterations 200
# Switch base model (useful for 16GB Macs):
THE_BRAIN_MLX_MODEL=mlx-community/Llama-3.2-1B-Instruct-4bit the-brain train --global
# Add extra Python dependencies (semicolon-separated):
THE_BRAIN_MLX_DEPS="mlx-audio" the-brain train --globalTroubleshooting
Metal out of memory (OOM)
Symptom: Training fails with messages about Metal memory pressure, out-of-memory, or the process exits during the first training steps.
Cause: The selected base model, batch size, or concurrent apps exceed the available unified memory on your Mac.
Fix: Close other memory-heavy apps, retry with a smaller base model such as mlx-community/Llama-3.2-1B-Instruct-4bit, and keep other local AI workloads stopped while training. If you changed training settings, reduce them before retrying.
Model not found / failed to resolve model path
Symptom: the-brain train fails because the model cannot be found, downloaded, or loaded from modelPath.
Cause: mlx.modelPath points to a typo, a model that is not available in MLX format, or a local path that does not exist.
Fix: Verify the exact value in your config and use a known-working MLX model ID such as mlx-community/Llama-3.2-1B-Instruct-4bit. If you are using a local path, confirm the directory exists and is readable.
LoRA config mismatch / adapter will not load
Symptom: Training completes, but loading the adapter fails or generations look broken after switching to another base model.
Cause: LoRA adapters are tied to the exact base model and tokenizer they were trained with. Reusing the adapter with a different model family or variant will not work.
Fix: Load the adapter only with the same base model used during training. If you change mlx.modelPath, retrain a fresh adapter for that model instead of reusing the old checkpoint.
Python / virtual environment issues
Symptom: Commands fail with missing-module errors, Python version conflicts, or MLX imports fail even though the CLI is installed.
Cause: The training command is running outside the expected Python environment, or the environment does not include the MLX dependencies.
Fix: Use Python 3.11+ and run the readiness check from the same environment you will use for training: uv run --with mlx-lm python3 -c "import mlx.core; print('MLX ready')". If that fails, fix the environment first before retrying the-brain train.
Missing tokenizer / tokenizer config errors
Symptom: Training or adapter loading fails with errors about tokenizer.json, tokenizer config, or missing tokenizer assets.
Cause: The selected model files are incomplete, corrupted, or do not match the base model expected by the adapter.
Fix: Re-download or re-resolve the base model, then confirm the tokenizer assets come from the same model as the weights. If you already trained an adapter against a different tokenizer, retrain after correcting the base model.
Parameters
| Parameter | Default | Description |
|---|---|---|
learningRate | 1e-4 | Learning rate |
loraRank | 16 | LoRA rank |
loraAlpha | 32 | Scaling factor |
batchSize | 4 | Batch size |
iterations | 200 | Steps per run |
minFragments | 3 | Min memories to trigger |
Output
~/.the-brain/lora-checkpoints/
├── adapter.safetensors # LoRA weights (~2-5 MB)
├── training_config.json # Run metadata
└── training_data.jsonl # Input dataUsing the Adapter
# LM Studio: add adapter path in model settings
# CLI inference
uv run --with mlx-lm --with mlx-vlm python3 -c "
from mlx_lm import load, generate
model, tokenizer = load('mlx-community/Llama-3.2-1B-Instruct-4bit',
adapter_path='~/.the-brain/lora-checkpoints')
print(generate(model, tokenizer, prompt='Write a React component'))
"