MLX Training

Local LoRA fine-tuning on Apple Silicon — how adapters work, Llama + Ollama example

the-brain's Deep Layer uses Apple MLX for zero-cost, fully private LoRA training.

⚠️ LoRA adapters are permanently bound to their base model. An adapter trained on mlx-community/Llama-3.2-1B-Instruct-4bit only works with that exact model — same architecture, same tokenizer. It's not a standalone model, it's a set of weights that bias one specific model toward your style. If you switch base models, you need to retrain. Think of it like a tailored suit — it only fits one person.

Concrete Example: Llama 3.2 + Ollama

Here's a full workflow using Meta's Llama 3.2 1B as the base model:

1. Configure the base model

{
  "mlx": {
    "enabled": true,
    "modelPath": "mlx-community/Llama-3.2-1B-Instruct-4bit",
    "loraOutputDir": "~/.the-brain/lora-checkpoints",
    "schedule": "0 2 * * *"
  }
}

2. What happens

During the day: You work with Cursor, Claude Code, or any AI tool. the-brain harvests your interactions — corrections ("no, use useCallback"), preferences ("arrow functions"), patterns ("always Jest + Testing Library").
At 2 AM: SPM has promoted the surprising, high-value memories to the DEEP layer. MLX training runs — ~200 iterations on your consolidated memories — producing adapter.safetensors.
Result: A 2-5 MB adapter file that, when loaded with Llama 3.2, makes the model consistently follow your patterns. It will default to arrow functions, suggest Jest over Vitest, and remember your preferred architecture style.

3. Loading the adapter in Ollama

# Pull the base model
ollama pull llama3.2:1b

# Create a custom model with the adapter
ollama create the-brain-model --from llama3.2:1b --adapter ~/.the-brain/lora-checkpoints/adapter.safetensors

Then point the-brain's LLM config at it:

{
  "llm": {
    "default": "ollama-adapted",
    "backends": {
      "ollama-adapted": {
        "provider": "ollama",
        "baseUrl": "http://localhost:11434/v1",
        "defaultModel": "the-brain-model"
      }
    }
  }
}

The model now behaves as if it's been working with you for months. Same Llama 3.2 — just biased toward your style.

4. Loading in LM Studio

Download Llama-3.2-1B-Instruct from LM Studio
In model settings → "LoRA Adapter" → browse to ~/.the-brain/lora-checkpoints/adapter.safetensors
All subsequent generations use the adapted model

What changes in practice

Before adapter:

"Write a React login form" → Generates a generic form with function LoginForm(), uses useState, doesn't handle errors explicitly

After adapter (3 weeks of corrections):

"Write a React login form" → Uses arrow functions, useCallback for handlers, React Hook Form + Zod, explicit try-catch on submit, named exports

The adapter learned that you write React a specific way — without you ever configuring it manually.

Prerequisites

macOS with Apple Silicon (M1/M2/M3/M4)
Python 3.11+ and uv
uv run --with mlx-lm python3 -c "import mlx.core; print('MLX ready')"

Configuration

{
  "mlx": {
    "enabled": true,
    "modelPath": "mlx-community/Llama-3.2-1B-Instruct-4bit",
    "loraOutputDir": "~/.the-brain/lora-checkpoints",
    "schedule": "0 2 * * *"
  }
}

Training Flow

Day: Harvest interactions → SPM evaluates → promote to DEEP
Night (2 AM): Load DEEP memories → filter noise → MLX LoRA training

Manual Training

the-brain train              # Train on DEEP memories
the-brain train --dry-run    # Preview
the-brain train --iterations 200

# Switch base model (useful for 16GB Macs):
THE_BRAIN_MLX_MODEL=mlx-community/Llama-3.2-1B-Instruct-4bit the-brain train --global

# Add extra Python dependencies (semicolon-separated):
THE_BRAIN_MLX_DEPS="mlx-audio" the-brain train --global

Troubleshooting

Metal out of memory (OOM)

Symptom: Training fails with messages about Metal memory pressure, out-of-memory, or the process exits during the first training steps.

Cause: The selected base model, batch size, or concurrent apps exceed the available unified memory on your Mac.

Fix: Close other memory-heavy apps, retry with a smaller base model such as mlx-community/Llama-3.2-1B-Instruct-4bit, and keep other local AI workloads stopped while training. If you changed training settings, reduce them before retrying.

Model not found / failed to resolve model path

Symptom: the-brain train fails because the model cannot be found, downloaded, or loaded from modelPath.

Cause: mlx.modelPath points to a typo, a model that is not available in MLX format, or a local path that does not exist.

Fix: Verify the exact value in your config and use a known-working MLX model ID such as mlx-community/Llama-3.2-1B-Instruct-4bit. If you are using a local path, confirm the directory exists and is readable.

LoRA config mismatch / adapter will not load

Symptom: Training completes, but loading the adapter fails or generations look broken after switching to another base model.

Cause: LoRA adapters are tied to the exact base model and tokenizer they were trained with. Reusing the adapter with a different model family or variant will not work.

Fix: Load the adapter only with the same base model used during training. If you change mlx.modelPath, retrain a fresh adapter for that model instead of reusing the old checkpoint.

Python / virtual environment issues

Symptom: Commands fail with missing-module errors, Python version conflicts, or MLX imports fail even though the CLI is installed.

Cause: The training command is running outside the expected Python environment, or the environment does not include the MLX dependencies.

Fix: Use Python 3.11+ and run the readiness check from the same environment you will use for training: uv run --with mlx-lm python3 -c "import mlx.core; print('MLX ready')". If that fails, fix the environment first before retrying the-brain train.

Missing tokenizer / tokenizer config errors

Symptom: Training or adapter loading fails with errors about tokenizer.json, tokenizer config, or missing tokenizer assets.

Cause: The selected model files are incomplete, corrupted, or do not match the base model expected by the adapter.

Fix: Re-download or re-resolve the base model, then confirm the tokenizer assets come from the same model as the weights. If you already trained an adapter against a different tokenizer, retrain after correcting the base model.

Parameters

Parameter	Default	Description
`learningRate`	1e-4	Learning rate
`loraRank`	16	LoRA rank
`loraAlpha`	32	Scaling factor
`batchSize`	4	Batch size
`iterations`	200	Steps per run
`minFragments`	3	Min memories to trigger

Output

~/.the-brain/lora-checkpoints/
├── adapter.safetensors    # LoRA weights (~2-5 MB)
├── training_config.json   # Run metadata
└── training_data.jsonl    # Input data

Using the Adapter

# LM Studio: add adapter path in model settings

# CLI inference
uv run --with mlx-lm --with mlx-vlm python3 -c "
from mlx_lm import load, generate
model, tokenizer = load('mlx-community/Llama-3.2-1B-Instruct-4bit',
                         adapter_path='~/.the-brain/lora-checkpoints')
print(generate(model, tokenizer, prompt='Write a React component'))
"

On this page