NVIDIA releases a full pipeline to fine-tune embedding models on proprietary data using synthetic training, achieving 10–26% retrieval gains on a single GPU.
NVIDIA published a step-by-step tutorial and open pipeline for fine-tuning the Llama-Nemotron-Embed-1B-v2 embedding model on domain-specific documents without manual labeling. The pipeline uses synthetic query generation, hard negative mining, and multi-hop questions to create training data automatically. Atlassian applied this recipe to their JIRA dataset and lifted Recall@60 from 0.751 to 0.951 — a 26% improvement. The pipeline runs on a single A100 or H100 (80GB) GPU in under a day, and NVIDIA released a ready-made synthetic dataset from their own documentation to let teams start immediately.
This pipeline eliminates the biggest bottleneck in production RAG: poor retrieval caused by general-purpose embeddings that don't understand your domain vocabulary. The synthetic data generation step means you don't need a single labeled query-document pair — your existing docs are the training set. The Nemotron-Embed-1B-v2 base model is small enough to fine-tune on one A100 in hours, and the pipeline outputs a drop-in replacement for whatever embedding model you're using today.
Run the nvidia/Retrieval-Synthetic-NVDocs-v1 dataset through the NeMo Automodel fine-tuning recipe this week, benchmark Recall@10 against your current embedding model on a held-out test set, and replace only if you see >5% lift — use that as your deployment gate.
Go to huggingface.co/nvidia/Retrieval-Synthetic-NVDocs-v1 and inspect 10 rows of the synthetic (query, document) pairs. Paste one query into your current RAG system and check if the top-1 retrieved chunk matches the expected document. If it doesn't, you have a concrete baseline failure to benchmark against after fine-tuning.
Tags
Signals by role
Also today
Tools mentioned