Updated for 2026 · Practical & Free

How to Become an
AI Engineer
in 2026

Go from Python basics to shipping real AI products in 16 focused weeks: agents, RAG pipelines, local inference, and a portfolio you can show. Everything here is free or open-source.

16
Weeks
9
Learning Tracks
63
Projects (53 + 10)
50+
Free Resources
Learning Tracks

The Complete 2026 Roadmap

Nine practical tracks, from the basics to production. Work through them and you'll be a confident AI engineer.

🐍

Foundations

Python, JavaScript, math (vectors, probability, stats), Git, and REST APIs. The basics every AI engineer leans on every day.

CLI data app + FastAPI endpoint
🧠

LLM Basics

Transformers, embeddings, prompting, multimodal models, tool calling, and reasoning patterns across major model families.

Chat app with tool calling
📚

RAG & Search

Chunking, retrieval, vector databases, reranking, GraphRAG, and repeatable evaluation. This is how your app answers from real sources.

Document Q&A with citations
🤖

AI Agents

LangGraph graphs, CrewAI teams, AutoGen conversations, OpenAI Agents SDK, memory, retries, and approval gates.

Research agent with memory
💻

Coding Agents

Claude Code, OpenAI Codex, Kimi Code, Cursor, Windsurf, Cline. Ship real features with AI help, and read every diff before you keep it.

Feature shipped with tests

Local Inference

Run 4-bit and 8-bit quantized models on a 16 GB GPU (T4 or M-series). Text, image, and video generation with Ollama, vLLM, ComfyUI, and AUTOMATIC1111.

Text + image + video pipeline on local GPU
🎨

UI & Product

Streamlit, Chainlit, Next.js, Lovable, v0, Bolt. Prototype fast with vibe-coding tools, then refactor into clean code.

Polished AI demo + landing page
📊

LLMOps

Tracing, evals, safety, monitoring, cost controls. LangSmith, Phoenix, DeepEval, Ragas. Know when your app breaks.

Evaluated app with dashboards
☁️

Cloud AI & Deployment

Managed AI on Azure, AWS, and GCP: Azure AI Foundry, Bedrock, Vertex AI, managed RAG and AutoML, plus deploy, monitoring, and cost control.

App deployed on a managed cloud
Stay Current

Frontier Topics You Can't Skip in 2026

The roadmap gets you job-ready. These are the models and ideas shaping AI engineering right now. Skim them, then dig into whatever your projects need.

🌍

Future of LLMs: JEPA & World Models

Beyond next-token · world models · self-supervised

Yann LeCun argues that text-only LLMs are hitting a ceiling, and that the next step is models that learn a world model from video in an abstract space (JEPA). Worth knowing where the field may head next.

Yann LeCun's $1B bet against LLMs: the case for JEPA (Welch Labs) Yann LeCun's $1B bet against LLMs, Part 2 (Welch Labs) I-JEPA: joint-embedding predictive architecture (paper) V-JEPA 2: Meta's open video world model
🚀

Forward Deployed Engineer

Career path · TC $230k–$550k+

The hottest, least-crowded AI job of 2026: an engineer who works inside a client's org to ship production AI (RAG, agents, evals), not slides. It's exactly what this roadmap trains you for.

📋 Our FDE skillset & study path: full guide What is an FDE? The role OpenAI, Anthropic & Google are hiring for Palantir AI FDE: the company that invented the model Forward Deployed AI Engineer: a live job spec
🧩

Reasoning Models

Test-time compute · math · code

Models that "think" before they answer. They're strongest on math, science, and hard code. Learn when the extra time and cost is worth it over a fast standard model.

Top 10 open-source reasoning models (2026) DeepSeek-R1: open-weight frontier reasoning Qwen QwQ-32B: a 32B that competes with giants
🌐

Omni / Any-to-Any Models

Text · image · audio · video: one call

One model that takes in and gives back every kind of input: text, image, audio, video. It's becoming the go-to for voice assistants and multimodal apps, including real-time speech-to-speech.

Qwen2.5-Omni 7B: open omni model you can run locally Gemini Omni: Google's native any-to-any model GPT-4o Realtime API: speech-to-speech
🐤

Best Small Language Models

Sub-10B · runs on a laptop or 16 GB GPU

Small models now beat 2024-era GPT-4 on many tasks while running on your own device: cheaper, private, and fast. Try one before you reach for a frontier API.

Qwen3-4B: best all-round small model Gemma 3 4B: 128K context, multimodal Phi-4: best small-model reasoning & math Best open SLMs in 2026: comparison guide
🪙

1-bit & Ternary LLMs

1.58-bit / true 1-bit · phone-ready

Weights stored as {-1, 0, 1} (1.58-bit) or true 1-bit. That makes them 10–16× smaller, small enough to run on a phone with no GPU. This is the cutting edge of on-device AI.

BitNet b1.58 2B4T: Microsoft's official 1.58-bit model The Era of 1-bit LLMs: the foundational paper Bonsai 8B: first commercially viable true 1-bit LLM
📉

Quantization & TurboQuant

Shrink weights & KV cache to 2–4 bits

Compress models with little quality loss so they fit your GPU. TurboQuant (ICLR 2026) squeezes the KV cache 5–6×, which is what makes long-context, low-memory serving possible.

TurboQuant: near-optimal KV-cache quantization llama.cpp / GGUF: run 4-bit & 8-bit models anywhere AWQ: activation-aware weight quantization

Speculative Decoding

2–6× faster · identical output

A small "draft" model suggests tokens and the big model checks them in parallel. You get 2–6× faster inference with no drop in quality. It's built into vLLM, so you can turn it on today.

Speculative decoding in vLLM: up to 2.8× EAGLE-3: current state-of-the-art draft method Medusa: parallel decoding heads (paper)
📄

OCR & Document AI

VLM-based · scans · tables · handwriting

Vision-language models now read messy documents better than the older engines do. They sit behind almost every real-world RAG and data pipeline.

olmOCR: open, high-throughput PDF → text Mistral OCR: strong on handwriting & tables Docling: IBM, documents → clean markdown Qwen2.5-VL: open VLM, 90+ languages
🎨

Image & Video Generation

Diffusion · local via ComfyUI or hosted API

Generate and edit images and short video. Run open models locally on a 16 GB GPU, or call hosted APIs. Great for product, marketing, and creative features.

FLUX.1: top open image model Stable Diffusion 3.5: open, high quality ComfyUI: node workflow for image & video
Lights, Camera, Diffusion

Video Generation in 2026

Text-to-video went from research demo to a real working tool in about a year, and by 2026 most top models add synced audio on their own. Use a hosted API for the best quality, or run an open model on your own GPU.

🎬

Hosted APIs: Best Quality

Closed models · native audio · 4K

Reach for these when output quality matters most. Heads-up: OpenAI is retiring the Sora API in Sept 2026, so build on Veo, Kling, Runway, or Seedance instead.

Google Veo 3.1: best all-rounder, native audio + 4K Kling 3.0: multi-shot storyboard, ~$0.10/sec Runway Gen-4.5: pro control over camera moves and motion brush Seedance 2.0: ByteDance, tops the video arena Luma Dream Machine: fast and affordable Pika: quick, social-ready clips & effects
🖥️

Open / Local Models

Run on your own GPU · free & private

Self-host for zero per-second cost, full privacy, and LoRA fine-tuning. Wan 2.2 leads on all-round quality; LTX-Video runs on as little as 12 GB VRAM.

Wan 2.2 (MoE): best open model, commercial-safe HunyuanVideo 13B: cinematic, great for human subjects LTX-Video: fastest, runs on 12 GB VRAM CogVideoX-5B: ~10 GB, solid 6-sec clips Mochi 1 (10B): high fidelity & prompt adherence Open-Sora: fully open text-to-video pipeline
🛠️

Run, Serve & Compare

Workflow tools & benchmarks

Drive open models with a node workflow, rent a GPU per second, and check the live leaderboard before you commit to a model.

ComfyUI: node workflow for every video model Replicate: run any video model via API, pay per second Video Arena: live text-to-video quality leaderboard Best open-source video models: 2026 comparison
Start Here

Free AI Platforms to Try First

Explore these tools before writing a single line of code. Get a feel for the AI landscape.

🤖

ChatGPT

Coding, analysis, image gen, agents

Claude

Writing, coding, long-context reasoning

💎

Gemini

Google ecosystem, multimodal

🌙

Kimi

Long-context chat and code

🔵

DeepSeek

Reasoning and coding models

🟣

Qwen Chat

Alibaba models, vision, long-context

𝕏

Grok

xAI assistant and API ecosystem

GroqCloud

Ultra-fast hosted inference

🔍

Perplexity

Research with cited sources

🌐

Poe

Many model providers, one place

Managed Cloud AI

Cloud AI Platforms & AutoML

Production-grade AI services you can call from an API or build on without managing infrastructure.

🔷

Azure AI Foundry

Microsoft's unified AI platform: models, agents, evaluations, fine-tuning

🤖

Azure AutoML

Auto-trains and deploys ML models without writing training code

🔗

Azure AI Agent Service

Deploy and manage agents with tools, memory, and code interpreter

🧠

Azure OpenAI Service

GPT, reasoning, and embedding models on Azure with enterprise security

🔎

Azure AI Search

Vector and hybrid search, the retrieval layer for RAG on Azure

🗄️

Azure Cosmos DB

Global database with built-in vector search for storing embeddings

📄

Azure AI Document Intelligence

Extract text, tables, and fields from forms, scans, and PDFs

🛡️

Azure AI Content Safety

Moderation and guardrails for text and image input and output

🎙️

Azure AI Speech

Speech-to-text and text-to-speech for voice agents

🌿

AWS Bedrock

Managed API for Claude, Titan, Llama, Mistral: no infra to manage

🛠️

AWS SageMaker

End-to-end ML platform: AutoPilot handles training, tuning, deployment

🔮

Google Vertex AI

Gemini API, AutoML, model garden, agent builder: full Google AI stack

🤗

HF Inference Endpoints

Deploy any Hugging Face model as a dedicated API in one click

Together AI

Fast inference for open models: Llama, Qwen, Mistral, FLUX

🔁

Replicate

Run any open-source model (text, image, video) via API: pay per second

🚀

Modal

Serverless GPU functions: deploy inference endpoints in pure Python

🎆

Fireworks AI

Production-grade fast inference for open models with custom fine-tuning

🖥️

RunPod

Rent GPU pods by the hour: ideal for fine-tuning and heavy inference

Before You Start

Is This Realistic for You?

16 weeks at ~10 hrs/week = ~160 hours total. Here's who can actually finish it.

🎓
Fully Feasible

Student / Career Break

Dedicated full-time learner with no major commitments. 4 hrs/day gets you done in about 8 weeks.

💼
Feasible

Developer, 10 hrs/week

2 focused hours on weekdays. Consistent pace, no burnout. The plan was built for this profile.

🌙
Stretch

Full-Time Job, Evenings

Realistic budget is 1–2 hrs/day max. At 5–8 hrs/week, plan for 24–32 weeks instead.

📅
Possible

Weekends Only

8 hrs/weekend = ~20 weeks. Works if weekends are genuinely free and you protect the time.

👶
Difficult

Full-Time Job + Family

Budget 4–6 hrs/week realistically. Aim for 8 months, not 4. Quality over speed.

🔰
Add More Time

Complete Beginner, No Python

Add 4–6 weeks for Python and Git fundamentals before week 1. Plan for 20–22 weeks total.

Two paces that work: pick your deadline

Fast track
4 hrs
per day, 5 days/week
20 hrs/week · finishes in ~8 weeks · realistic for full-time learners · keep 2 rest days a week to stay sharp and avoid burnout.

The honest pattern most self-learners follow

Week 1–2
Motivated: hits the target hours every day
Week 3–4
Fatigue sets in: average drops to 2–3 hrs/day
Week 5–6
Life interrupts: some days are zero, backlog builds
Week 7+
Burnout risk if over-scheduled: many people quit here

Consistency beats intensity for self-learning.

The person who does 2 focused hours every day always outpaces the person doing 4 hours some days and 0 hours on others. Protect your daily minimum: even 90 minutes counts.
Interactive Tracker

Your 16-Week Study Plan

Check off weeks as you complete them. Progress saves in your browser. Filter by track to focus. Two tips: pick one cloud and stick with it, and treat the linked courses as references to pull from, not full watch-throughs.

W5 RAG Baseline

Parse documents, chunk content, embed text, retrieve relevant sources, and answer questions with citations and retrieval metrics.

10–12 hrs
RAG
W6 Advanced RAG: Rerank & GraphRAG

Go past basic RAG: add reranking and hybrid search, build a GraphRAG index for multi-hop questions, and measure retrieval quality with a repeatable eval set so you can prove improvements.

10–12 hrs
RAG
W7 LangGraph Agents

Build a planner-search-writer-critic graph with shared state, automatic retries, and a human-in-the-loop approval checkpoint.

10–12 hrs
Agents
W8 Advanced Agent Frameworks

Compare CrewAI, AutoGen, OpenHands, and OpenAI Agents SDK. Build a multi-agent team that handles a real research workflow.

12–14 hrs
Agents
W9 Coding Agents in Practice

Ship one real feature with Claude Code, Codex, or Kimi Code. Review every diff yourself. Write tests. Understand security risks.

8–10 hrs
Coding
W10 Vibe-Coded UI & Prototyping

Prototype in Lovable, v0, or Bolt in under an hour. Then refactor into a clean app with proper states, accessibility, and maintainable code.

8–10 hrs
Product
W11 Local Inference & Serving

Run 4-bit and 8-bit quantized models on a 16 GB GPU (T4 or M-series). Try Qwen3-4B, Gemma 3 4B, Llama 3.1 8B for text. Generate images with SD3.5 Medium or SDXL via ComfyUI. Run CogVideoX-2B for video. Serve with vLLM and benchmark tokens/sec.

10–12 hrs
Inference
W12 Cloud AI & Deployment

Deploy on one managed cloud using its free tier or trial credits (pick one, not all three). Call a managed model on Azure AI Foundry, AWS Bedrock, or Vertex AI, wire in managed search, and track cost and latency.

8–10 hrs
Cloud
W13 Evals, Tracing & Monitoring

Add tracing, regression prompts, RAG quality metrics, cost logging, and a failure examples library so your app gets measurably better.

9–11 hrs
LLMOps
W15 Capstone: Scope & Build

Pick one product from the capstone list, scope it to the smallest valuable version, and build it end to end with real or realistic data, auth, and a working UI.

12–15 hrs
Capstone
W16 Capstone: Deploy, Eval & Demo

Take your capstone to production: deploy to one managed cloud, add tracing, an eval suite, and monitoring, then record a tight 3-minute demo and write a clear README and a short pitch deck.

12–15 hrs
Capstone
🚀

Targeting a Forward Deployed Engineer role?

This plan already builds the FDE technical core. Layer these client-facing skills on top: they're what separate an FDE from a backend AI developer, and the part most self-learners skip.

✅ The plan already covers
  • LLM APIs & model selection W3–4
  • RAG & advanced retrieval W5–6
  • Agents, tools & memory W7–8
  • Local inference & serving W11
  • Cloud AI & deployment W12
  • Evals, tracing & monitoring W13
  • Safety, guardrails & fine-tuning W14
  • Ship & deploy a real product W15–16
➕ Add these FDE-specific skills
Client communication under pressure Scope before you code Deploy on AWS / Azure / GCP / on-prem Security & compliance (HIPAA) Eval suites (LLM-as-judge) System design (cost + latency) Stakeholder demos Reading messy codebases

The full guide includes a client-communication playbook and 10 detailed client scenarios (Claude Code, Codex, healthcare, finance, on-prem and cloud), each mapped to a project you can build.

FDE skillset, 10 scenarios & interview signals →
Curated Resources

Learn From the Best

Free courses, cookbooks, and documentation organized by what you need to build.

Tool Selection

Best Tools by Use Case

Don't learn every tool. Learn the right one for each job, then upgrade when you outgrow it.

💻

Coding

Claude Code OpenAI Codex Cursor Kimi Code Cline Roo Code Windsurf
🔗

Agent Graphs

LangGraph CrewAI AutoGen OpenAI Agents SDK Smolagents OpenHands

Local Inference

Ollama LM Studio vLLM (prod) llama.cpp Text Gen WebUI
📚

RAG & Search

LlamaIndex LangChain Chroma Qdrant Weaviate Milvus Pinecone
🎨

UI Prototyping

Lovable v0 Bolt.new Replit Figma Make Gamma
📊

Observability

LangSmith Arize Phoenix DeepEval Ragas W&B Weave MLflow
☁️

Cloud AI Platforms

Azure AI Foundry AWS Bedrock Vertex AI Azure AutoML SageMaker Together AI Replicate Modal Fireworks AI
🎨

Image Generation

ComfyUI AUTOMATIC1111 Fooocus InvokeAI SD3.5 Medium SDXL Chroma 8.9B SD 1.5
🎬

Video Generation

CogVideoX-2B LTX-Video 2B CogVideoX-5B Wan2.1 1.3B AnimateDiff Open-Sora ComfyUI
Build Your Portfolio

10 Capstone Products, One Per Domain

These aren't toy demos. Each one is a full-stack product in a different domain, built end to end with real data, a deploy, evals, and guardrails. Build two or three really well and your portfolio stands on its own.

🛠️ Browse the Hands-On Project Collection: all 53 projects →

A full build-it-yourself course: 53 real-world projects across ML, Deep Learning, NLP, GenAI, and AI Agents. Each one comes with a short description, a tech stack, documented Python, a shared chat UI, and tests. Click through to read the code and docs. Works for coders and non-coders alike.

📐 Read the full capstone briefs: problem, plan, tech stack & 5+ features each →

Ten end-to-end products across healthcare, finance, education, HR, real estate, search, no-code agents, Azure cloud AI, insurance, and investment. Open any one for the full build spec.
1

Clinical Decision Support & Triage Assistant

Guideline-grounded triage and Q&A with hard safety rails, a clinician review queue, audit log, and visit-note summarization, deployed in a private cloud.

Healthcare RAG + Agents Guardrails
2

Banking Support & Fraud-Aware Assistant

Policy Q&A plus a SQL agent over transactions, dispute intake, fraud flagging, PII masking, and a no-advice compliance guardrail with human handoff.

Finance SQL Agent Guardrails
3

Adaptive Tutor & Course Builder

A Socratic tutor that hints without leaking answers, auto-builds quizzes and lesson plans from a syllabus, grades with rubrics, and tracks each student.

Education Tutoring Agent Evals
4

Recruitment & People-Ops Copilot

Bias-aware resume-to-role matching, interview-kit generation, an employee policy bot, skills-gap analysis, PII redaction, and a recruiter dashboard.

HR Semantic Match Fairness
5

Property Discovery & Deal Analyzer

Natural-language and geo search over listings, vision tagging of photos, contract extraction, a deal and affordability agent, and a map dashboard with alerts.

Real Estate Vision + RAG Maps
6

Enterprise Knowledge Search Platform

Hybrid keyword + vector search across Drive, Confluence, and Slack with permission-aware retrieval, GraphRAG, reranking, cited answers, and a feedback loop.

Search Hybrid + GraphRAG ACL
7

No-Code AI Agent Builder

A visual agent builder on Copilot Studio and Azure AI Foundry: no-code knowledge upload, a tool registry, a test sandbox, channel publishing, and governance.

No-Code Copilot Studio Azure AI Foundry
8

Azure-Native Document & Call Intelligence

Built only on managed Azure AI services: Document Intelligence, Speech, OpenAI, and AI Search, with Content Safety guardrails and a cost and latency dashboard.

Azure Document + Speech Content Safety
9

Insurance Policy & Claims Assistant

Policy Q&A with clause-level citations, OCR claims intake, a coverage-eligibility rules engine, fraud flagging, and an adjuster dashboard with audit and SLAs.

Insurance OCR + Rules Guardrails
10

Investment Research & Portfolio Assistant

RAG over filings plus a SQL agent over holdings, risk and exposure analysis, scenario simulation, a strict no-advice guardrail, alerts, and full tracing.

Investment RAG + SQL Guardrails
16-Week Overview

The 16-Week Path at a Glance

Six focused phases. Each ends with a concrete deliverable you can show to employers.

1
Weeks 1–2

Foundations: Python, Git, APIs & Math

Build Python and Git fluency, learn API calls, and refresh the vectors, probability, and stats you will use all year.

✦ CLI app + math refresher
2
Weeks 3–4

LLM APIs & Prompt Engineering

Use multiple model APIs, compare cost and quality, and build reusable prompt templates with schema and critique passes.

✦ Chat app with tool calling
3
Weeks 5–6

RAG & Advanced Retrieval

Build a RAG baseline, then add reranking, hybrid search, and GraphRAG, and measure retrieval quality with a repeatable eval set.

✦ Document Q&A with GraphRAG + evals
4
Weeks 7–9

Agents & Coding Agents

Build stateful LangGraph agents, compare frameworks, then ship one real feature with a coding agent and review every diff.

✦ Research agent + a shipped feature
5
Weeks 10–12

UI, Local Inference & Cloud

Prototype a UI, run and serve quantized models locally, then deploy to one managed cloud on its free tier and compare cost and latency.

✦ Local server + a cloud-deployed app
6
Weeks 13–16

LLMOps, Safety & Capstone

Add tracing, evals, and guardrails, fine-tune a small model if needed, then scope, build, deploy, and demo your capstone product.

✦ Capstone: deployed, evaluated, with a demo
Definition of Done

Ready to Apply?

You're ready for an AI engineering role when you can genuinely say yes to all of these.

Build a Python or JavaScript AI app from scratch without a tutorial holding your hand.

Explain embeddings, attention, tokens, context windows, tool calling, and RAG to a non-technical person.

Use at least one coding agent (Claude Code, Codex) responsibly and review every diff it produces.

Build a LangGraph or equivalent agent with state management, tool calls, and error handling.

Run a local model with Ollama, understand when vLLM is worth the complexity.

Evaluate a RAG or agent app with repeatable test cases and measurable quality metrics.

Deploy a small AI app and monitor latency, cost, and output quality over time.

Communicate tradeoffs clearly: model choice, safety, privacy, UX, and budget constraints.

Your 16-Week Plan Starts Today

Track your progress, check off every week, and build a portfolio that proves you can ship AI.

Open Interactive Study Plan →