Build a CLI notebook helper, learn API calls, pandas basics, and keep every experiment in Git from day one.
Go from Python basics to shipping real AI products in 16 focused weeks: agents, RAG pipelines, local inference, and a portfolio you can show. Everything here is free or open-source.
Nine practical tracks, from the basics to production. Work through them and you'll be a confident AI engineer.
Python, JavaScript, math (vectors, probability, stats), Git, and REST APIs. The basics every AI engineer leans on every day.
CLI data app + FastAPI endpointTransformers, embeddings, prompting, multimodal models, tool calling, and reasoning patterns across major model families.
Chat app with tool callingChunking, retrieval, vector databases, reranking, GraphRAG, and repeatable evaluation. This is how your app answers from real sources.
Document Q&A with citationsLangGraph graphs, CrewAI teams, AutoGen conversations, OpenAI Agents SDK, memory, retries, and approval gates.
Research agent with memoryClaude Code, OpenAI Codex, Kimi Code, Cursor, Windsurf, Cline. Ship real features with AI help, and read every diff before you keep it.
Feature shipped with testsRun 4-bit and 8-bit quantized models on a 16 GB GPU (T4 or M-series). Text, image, and video generation with Ollama, vLLM, ComfyUI, and AUTOMATIC1111.
Text + image + video pipeline on local GPUStreamlit, Chainlit, Next.js, Lovable, v0, Bolt. Prototype fast with vibe-coding tools, then refactor into clean code.
Polished AI demo + landing pageTracing, evals, safety, monitoring, cost controls. LangSmith, Phoenix, DeepEval, Ragas. Know when your app breaks.
Evaluated app with dashboardsManaged AI on Azure, AWS, and GCP: Azure AI Foundry, Bedrock, Vertex AI, managed RAG and AutoML, plus deploy, monitoring, and cost control.
App deployed on a managed cloudThe roadmap gets you job-ready. These are the models and ideas shaping AI engineering right now. Skim them, then dig into whatever your projects need.
Yann LeCun argues that text-only LLMs are hitting a ceiling, and that the next step is models that learn a world model from video in an abstract space (JEPA). Worth knowing where the field may head next.
Yann LeCun's $1B bet against LLMs: the case for JEPA (Welch Labs) Yann LeCun's $1B bet against LLMs, Part 2 (Welch Labs) I-JEPA: joint-embedding predictive architecture (paper) V-JEPA 2: Meta's open video world modelThe hottest, least-crowded AI job of 2026: an engineer who works inside a client's org to ship production AI (RAG, agents, evals), not slides. It's exactly what this roadmap trains you for.
📋 Our FDE skillset & study path: full guide What is an FDE? The role OpenAI, Anthropic & Google are hiring for Palantir AI FDE: the company that invented the model Forward Deployed AI Engineer: a live job specModels that "think" before they answer. They're strongest on math, science, and hard code. Learn when the extra time and cost is worth it over a fast standard model.
Top 10 open-source reasoning models (2026) DeepSeek-R1: open-weight frontier reasoning Qwen QwQ-32B: a 32B that competes with giantsOne model that takes in and gives back every kind of input: text, image, audio, video. It's becoming the go-to for voice assistants and multimodal apps, including real-time speech-to-speech.
Qwen2.5-Omni 7B: open omni model you can run locally Gemini Omni: Google's native any-to-any model GPT-4o Realtime API: speech-to-speechSmall models now beat 2024-era GPT-4 on many tasks while running on your own device: cheaper, private, and fast. Try one before you reach for a frontier API.
Qwen3-4B: best all-round small model Gemma 3 4B: 128K context, multimodal Phi-4: best small-model reasoning & math Best open SLMs in 2026: comparison guideWeights stored as {-1, 0, 1} (1.58-bit) or true 1-bit. That makes them 10–16× smaller, small enough to run on a phone with no GPU. This is the cutting edge of on-device AI.
BitNet b1.58 2B4T: Microsoft's official 1.58-bit model The Era of 1-bit LLMs: the foundational paper Bonsai 8B: first commercially viable true 1-bit LLMCompress models with little quality loss so they fit your GPU. TurboQuant (ICLR 2026) squeezes the KV cache 5–6×, which is what makes long-context, low-memory serving possible.
TurboQuant: near-optimal KV-cache quantization llama.cpp / GGUF: run 4-bit & 8-bit models anywhere AWQ: activation-aware weight quantizationA small "draft" model suggests tokens and the big model checks them in parallel. You get 2–6× faster inference with no drop in quality. It's built into vLLM, so you can turn it on today.
Speculative decoding in vLLM: up to 2.8× EAGLE-3: current state-of-the-art draft method Medusa: parallel decoding heads (paper)Vision-language models now read messy documents better than the older engines do. They sit behind almost every real-world RAG and data pipeline.
olmOCR: open, high-throughput PDF → text Mistral OCR: strong on handwriting & tables Docling: IBM, documents → clean markdown Qwen2.5-VL: open VLM, 90+ languagesGenerate and edit images and short video. Run open models locally on a 16 GB GPU, or call hosted APIs. Great for product, marketing, and creative features.
FLUX.1: top open image model Stable Diffusion 3.5: open, high quality ComfyUI: node workflow for image & videoText-to-video went from research demo to a real working tool in about a year, and by 2026 most top models add synced audio on their own. Use a hosted API for the best quality, or run an open model on your own GPU.
Reach for these when output quality matters most. Heads-up: OpenAI is retiring the Sora API in Sept 2026, so build on Veo, Kling, Runway, or Seedance instead.
Google Veo 3.1: best all-rounder, native audio + 4K Kling 3.0: multi-shot storyboard, ~$0.10/sec Runway Gen-4.5: pro control over camera moves and motion brush Seedance 2.0: ByteDance, tops the video arena Luma Dream Machine: fast and affordable Pika: quick, social-ready clips & effectsSelf-host for zero per-second cost, full privacy, and LoRA fine-tuning. Wan 2.2 leads on all-round quality; LTX-Video runs on as little as 12 GB VRAM.
Wan 2.2 (MoE): best open model, commercial-safe HunyuanVideo 13B: cinematic, great for human subjects LTX-Video: fastest, runs on 12 GB VRAM CogVideoX-5B: ~10 GB, solid 6-sec clips Mochi 1 (10B): high fidelity & prompt adherence Open-Sora: fully open text-to-video pipelineDrive open models with a node workflow, rent a GPU per second, and check the live leaderboard before you commit to a model.
ComfyUI: node workflow for every video model Replicate: run any video model via API, pay per second Video Arena: live text-to-video quality leaderboard Best open-source video models: 2026 comparisonExplore these tools before writing a single line of code. Get a feel for the AI landscape.
Coding, analysis, image gen, agents
Writing, coding, long-context reasoning
Google ecosystem, multimodal
Long-context chat and code
Reasoning and coding models
Alibaba models, vision, long-context
xAI assistant and API ecosystem
Ultra-fast hosted inference
Research with cited sources
Many model providers, one place
Production-grade AI services you can call from an API or build on without managing infrastructure.
Microsoft's unified AI platform: models, agents, evaluations, fine-tuning
Auto-trains and deploys ML models without writing training code
Deploy and manage agents with tools, memory, and code interpreter
GPT, reasoning, and embedding models on Azure with enterprise security
Vector and hybrid search, the retrieval layer for RAG on Azure
Global database with built-in vector search for storing embeddings
Extract text, tables, and fields from forms, scans, and PDFs
Moderation and guardrails for text and image input and output
Speech-to-text and text-to-speech for voice agents
Managed API for Claude, Titan, Llama, Mistral: no infra to manage
End-to-end ML platform: AutoPilot handles training, tuning, deployment
Gemini API, AutoML, model garden, agent builder: full Google AI stack
Deploy any Hugging Face model as a dedicated API in one click
Fast inference for open models: Llama, Qwen, Mistral, FLUX
Run any open-source model (text, image, video) via API: pay per second
Serverless GPU functions: deploy inference endpoints in pure Python
Production-grade fast inference for open models with custom fine-tuning
Rent GPU pods by the hour: ideal for fine-tuning and heavy inference
16 weeks at ~10 hrs/week = ~160 hours total. Here's who can actually finish it.
Dedicated full-time learner with no major commitments. 4 hrs/day gets you done in about 8 weeks.
2 focused hours on weekdays. Consistent pace, no burnout. The plan was built for this profile.
Realistic budget is 1–2 hrs/day max. At 5–8 hrs/week, plan for 24–32 weeks instead.
8 hrs/weekend = ~20 weeks. Works if weekends are genuinely free and you protect the time.
Budget 4–6 hrs/week realistically. Aim for 8 months, not 4. Quality over speed.
Add 4–6 weeks for Python and Git fundamentals before week 1. Plan for 20–22 weeks total.
Consistency beats intensity for self-learning.
The person who does 2 focused hours every day always outpaces the person doing 4 hours some days and 0 hours on others. Protect your daily minimum: even 90 minutes counts.Check off weeks as you complete them. Progress saves in your browser. Filter by track to focus. Two tips: pick one cloud and stick with it, and treat the linked courses as references to pull from, not full watch-throughs.
Build a CLI notebook helper, learn API calls, pandas basics, and keep every experiment in Git from day one.
Refresh vectors, probability, statistics, cosine similarity, gradients, and evaluation vocabulary you'll need all year.
Use OpenAI, Claude, Gemini, Qwen, and Kimi APIs. Compare model cost, speed, context window, and output quality side-by-side.
Create reusable prompt templates with task, context, constraints, schema, examples, critique pass, and stop rules.
Parse documents, chunk content, embed text, retrieve relevant sources, and answer questions with citations and retrieval metrics.
Go past basic RAG: add reranking and hybrid search, build a GraphRAG index for multi-hop questions, and measure retrieval quality with a repeatable eval set so you can prove improvements.
Build a planner-search-writer-critic graph with shared state, automatic retries, and a human-in-the-loop approval checkpoint.
Compare CrewAI, AutoGen, OpenHands, and OpenAI Agents SDK. Build a multi-agent team that handles a real research workflow.
Ship one real feature with Claude Code, Codex, or Kimi Code. Review every diff yourself. Write tests. Understand security risks.
Prototype in Lovable, v0, or Bolt in under an hour. Then refactor into a clean app with proper states, accessibility, and maintainable code.
Run 4-bit and 8-bit quantized models on a 16 GB GPU (T4 or M-series). Try Qwen3-4B, Gemma 3 4B, Llama 3.1 8B for text. Generate images with SD3.5 Medium or SDXL via ComfyUI. Run CogVideoX-2B for video. Serve with vLLM and benchmark tokens/sec.
Deploy on one managed cloud using its free tier or trial credits (pick one, not all three). Call a managed model on Azure AI Foundry, AWS Bedrock, or Vertex AI, wire in managed search, and track cost and latency.
Add tracing, regression prompts, RAG quality metrics, cost logging, and a failure examples library so your app gets measurably better.
Add input and output guardrails (PII, jailbreak, banned content) and red-team your app, then fine-tune a small model with LoRA when prompting alone is not enough. Compare fine-tuned against few-shot.
Pick one product from the capstone list, scope it to the smallest valuable version, and build it end to end with real or realistic data, auth, and a working UI.
Take your capstone to production: deploy to one managed cloud, add tracing, an eval suite, and monitoring, then record a tight 3-minute demo and write a clear README and a short pitch deck.
This plan already builds the FDE technical core. Layer these client-facing skills on top: they're what separate an FDE from a backend AI developer, and the part most self-learners skip.
The full guide includes a client-communication playbook and 10 detailed client scenarios (Claude Code, Codex, healthcare, finance, on-prem and cloud), each mapped to a project you can build.
Free courses, cookbooks, and documentation organized by what you need to build.
Don't learn every tool. Learn the right one for each job, then upgrade when you outgrow it.
These aren't toy demos. Each one is a full-stack product in a different domain, built end to end with real data, a deploy, evals, and guardrails. Build two or three really well and your portfolio stands on its own.
🛠️ Browse the Hands-On Project Collection: all 53 projects →
A full build-it-yourself course: 53 real-world projects across ML, Deep Learning, NLP, GenAI, and AI Agents. Each one comes with a short description, a tech stack, documented Python, a shared chat UI, and tests. Click through to read the code and docs. Works for coders and non-coders alike.📐 Read the full capstone briefs: problem, plan, tech stack & 5+ features each →
Ten end-to-end products across healthcare, finance, education, HR, real estate, search, no-code agents, Azure cloud AI, insurance, and investment. Open any one for the full build spec.Guideline-grounded triage and Q&A with hard safety rails, a clinician review queue, audit log, and visit-note summarization, deployed in a private cloud.
Policy Q&A plus a SQL agent over transactions, dispute intake, fraud flagging, PII masking, and a no-advice compliance guardrail with human handoff.
A Socratic tutor that hints without leaking answers, auto-builds quizzes and lesson plans from a syllabus, grades with rubrics, and tracks each student.
Bias-aware resume-to-role matching, interview-kit generation, an employee policy bot, skills-gap analysis, PII redaction, and a recruiter dashboard.
Natural-language and geo search over listings, vision tagging of photos, contract extraction, a deal and affordability agent, and a map dashboard with alerts.
Hybrid keyword + vector search across Drive, Confluence, and Slack with permission-aware retrieval, GraphRAG, reranking, cited answers, and a feedback loop.
A visual agent builder on Copilot Studio and Azure AI Foundry: no-code knowledge upload, a tool registry, a test sandbox, channel publishing, and governance.
Built only on managed Azure AI services: Document Intelligence, Speech, OpenAI, and AI Search, with Content Safety guardrails and a cost and latency dashboard.
Policy Q&A with clause-level citations, OCR claims intake, a coverage-eligibility rules engine, fraud flagging, and an adjuster dashboard with audit and SLAs.
RAG over filings plus a SQL agent over holdings, risk and exposure analysis, scenario simulation, a strict no-advice guardrail, alerts, and full tracing.
Six focused phases. Each ends with a concrete deliverable you can show to employers.
Build Python and Git fluency, learn API calls, and refresh the vectors, probability, and stats you will use all year.
✦ CLI app + math refresherUse multiple model APIs, compare cost and quality, and build reusable prompt templates with schema and critique passes.
✦ Chat app with tool callingBuild a RAG baseline, then add reranking, hybrid search, and GraphRAG, and measure retrieval quality with a repeatable eval set.
✦ Document Q&A with GraphRAG + evalsBuild stateful LangGraph agents, compare frameworks, then ship one real feature with a coding agent and review every diff.
✦ Research agent + a shipped featurePrototype a UI, run and serve quantized models locally, then deploy to one managed cloud on its free tier and compare cost and latency.
✦ Local server + a cloud-deployed appAdd tracing, evals, and guardrails, fine-tune a small model if needed, then scope, build, deploy, and demo your capstone product.
✦ Capstone: deployed, evaluated, with a demoYou're ready for an AI engineering role when you can genuinely say yes to all of these.
Build a Python or JavaScript AI app from scratch without a tutorial holding your hand.
Explain embeddings, attention, tokens, context windows, tool calling, and RAG to a non-technical person.
Use at least one coding agent (Claude Code, Codex) responsibly and review every diff it produces.
Build a LangGraph or equivalent agent with state management, tool calls, and error handling.
Run a local model with Ollama, understand when vLLM is worth the complexity.
Evaluate a RAG or agent app with repeatable test cases and measurable quality metrics.
Deploy a small AI app and monitor latency, cost, and output quality over time.
Communicate tradeoffs clearly: model choice, safety, privacy, UX, and budget constraints.