EXP-0004 — Road to Machine Learning: solid skeleton, advertised completeness not yet there
#forge#machine-learning#curriculum#python#open-source#education#partial
David OlssonThere are dozens of free, open online curricula that claim to take a complete beginner from "knows no programming" to "ready for a data-scientist job." Some are genuinely complete and excellent; others are well-intentioned scaffolds that haven't yet been filled in. Without actually opening the curriculum and trying the homework, a learner has no easy way to tell which is which.
Forge is our experiment harness. When someone marks a project with a 🧪 reaction in our #development channel, forge tries to clone it, build it in a clean sandbox, run something representative inside it, and write up what's actually there. The purpose is to check the claim before recommending the project to anyone.
This article is about a curriculum repository called Road to Machine Learning by a developer named NabidAlam. The README is impressive: 26 modules, 23 hands-on projects, an estimated 15–22 months of full-time study, MIT-licensed and free. The structure is real — when forge cloned it, all 26 module directories and all 23 project directories are exactly where the README says they are. The 271 markdown lessons are organized and information-dense.
The gap is in the homework. Of the 23 advertised "hands-on projects," only 5 actually contain runnable Python code — the rest are placeholder README files describing what the project would do, with no implementation yet. The README also invites learners to "launch Jupyter Notebook for interactive learning" but the repository contains zero Jupyter notebooks. The five projects that do have code are good — forge ran one of them (Iris flower classification) end-to-end and it worked cleanly, producing five visualizations and three correct predictions in 36 seconds.
This isn't a takedown. The repository is being actively edited, the writing is solid, and the structure is sound. But a prospective learner deserves to know that the advertised completeness isn't quite there yet at this commit. The detailed report below shows exactly what forge found, with the project's own consistency tools (which it ships, and which fail) doing most of the talking.
Status: experimented, result partial. Working code is good where it exists; the gap between what the README advertises and what is actually in the repository is large enough to be the headline finding.
This is a forge writeup of NabidAlam/road-to-machine-learning at commit 3b4319b. It is a self-paced ML learning curriculum laid out as 26 numbered module directories of markdown lessons, plus 23 nominal "projects." MIT licensed, Python 3.10–3.12.
TL;DR
| claim (README) | reality (commit 3b4319b) | |
|---|---|---|
| module directories | 26 | 26 ✓ |
| hands-on projects | 23 | 23 directories, 5 with executable code (22%) |
| Jupyter notebooks | "launch Jupyter Notebook for interactive learning" | 0 .ipynb files in repo |
| markdown corpus | "comprehensive, structured" | 271 markdown files, organized, voluminous |
| upstream link integrity | implicit (project ships tools/check_md_links.py) | link-checker exits 1 with ~25 broken anchors |
| one project end-to-end | should run after pip install -r requirements.txt | iris_classification.py runs cleanly, 36s, exit 0, predictions correct ✓ |
The curriculum skeleton is real and well-organized. The executable curriculum the README promises is mostly aspirational at this commit. Both findings come straight from running the project's own tools and one of its own scripts inside a clean python:3.12 sandbox.
What it is
The repository advertises itself as "From Zero to Hero" — a complete path from Python prerequisites through generative AI, with 26 modules, 23 hands-on projects, and an estimated 15–22 months of full-time learning. The author's framing is ambitious; the README is detailed; the structure is coherent. This is the kind of repo that benefits enormously from a forge bench, because the value claim is completeness and that's exactly the kind of thing a careful inventory can check.
The 26 module directories run from 00-prerequisites (Python, linear algebra) through 25-generative-ai-llms. They are sequenced sensibly: prereqs → classical ML → neural networks → deep learning frameworks → CV → NLP → deployment → MLOps → time series → projects (beginner, intermediate, advanced) → SQL → handling imbalanced data → explainability → reinforcement learning → GNNs → audio → generative AI. The numbering tells you the intended reading order and the index dependencies.
Each module contains a README.md, a primary deep-dive markdown file, a "quick reference," supplementary topic guides, and (sometimes) a tutorial walkthrough. 271 markdown files total. The voice in the prose is consistent. The information density is high. As a reading curriculum it is real and substantial.
What forge verified
Three things, in order.
1. Structural inventory
The README's claim of "26 modules, 23 hands-on projects" matches the directory count exactly: 26 numbered module directories and 23 project-NN-* subdirectories under 16-projects-beginner/, 17-projects-intermediate/, and 18-projects-advanced/. The numbering is the structure; the structure is what was promised.
2. The project's own link checker
The repo ships a script — tools/check_md_links.py — that scans every markdown file for broken relative links and missing GitHub-style heading anchors, with Exit code 1 if any issue is found. Forge ran it verbatim inside python:3.12:
$ python tools/check_md_links.py
[~25 broken anchors enumerated across:
00-prerequisites/01-python-basics.md
00-prerequisites/02-linear-algebra.md
01-python-for-data-science/08-working-with-dates-times.md
10-deep-learning-frameworks/deep-learning-frameworks.md
13-model-deployment/deployment-advanced-topics.md
14-mlops-basics/mlops-project-tutorial.md (and 2 more)
resources/data_science_cheatsheet.md
resources/full_stack_track/... (4 broken cross-file anchors)
resources/mlops_cheatsheet.md
resources/model_deployment_cheatsheet.md]
exit code 1
This isn't forge being persnickety — it's the upstream project's own consistency criterion, and at this commit it does not pass. None of the broken anchors are catastrophic; they're things like #cicd vs #ci-cd and #tensorflowkeras vs #tensorflow-keras — slug drift between when an in-page link was written and when the heading was last edited. Each is a one-line fix.
3. One of the working scripts, end-to-end
16-projects-beginner/project-02-iris-classification/iris_classification.py is one of the five projects that actually contains executable Python. Forge installed its dependencies (numpy, pandas, matplotlib, seaborn, scikit-learn), set MPLBACKEND=Agg for headless plotting, and ran the script:
STEP 1: Loading and Exploring Data … (printed dataframe head)
STEP 2: Visualizing Data … (saved pair_plot.png, box_plots.png, correlation_heatmap.png)
STEP 3: Train/Test Split … (120 / 30)
STEP 4: Training Three Models … LogisticRegression, DecisionTree, RandomForest
STEP 5: Model Comparison … (saved model_comparison.png)
STEP 6: Best Model Detail … (saved confusion_matrix.png)
STEP 7: Predictions on New Data
Sample 1: setosa
Sample 2: virginica
Sample 3: versicolor
PROJECT COMPLETE!
Exit 0, 36 seconds, five PNGs written, three predictions all correct. This is good pedagogical code — the steps are commented, the model comparison is explicit, the headless-friendly use of seaborn/matplotlib is correct, and the prediction sanity-check at the end gives the learner immediate feedback that the model works. If every project in the repo were at this level, the README's promises would be true.
The gap that defines the experiment
The per-project content audit is unambiguous:
beginner 6 dirs 5 with .py code 83%
intermediate 8 dirs 0 with .py code 0%
advanced 9 dirs 0 with .py code 0%
total 23 dirs 5 with .py code 22%
The 18 projects without .py files have a project README.md describing what the project would do — datasets, model choices, evaluation criteria — but no implementation. This is the difference between "a curriculum" and "a curriculum with the homework done." The author may very well intend to fill these in; the repo has been actively edited as recently as a day before this experiment ran. But as of commit 3b4319b, the "23 hands-on projects" claim is not yet supported by the artifacts.
Similarly, the README invites the reader to "launch Jupyter Notebook for interactive learning." There are zero .ipynb files in the repository.
This is a tractable problem. Adding 18 more .py files at the quality bar of iris_classification.py would close the gap entirely. Converting some or all of them to companion .ipynb notebooks would deliver on the interactive promise. Neither is forge's job — but flagging it cleanly so a potential learner can calibrate their expectations is forge's job.
How a learner should read this
Three takeaways:
- Treat it as a reading curriculum first. The 271 markdown files are organized, sequential, and information-dense. If you read straight through
00-prerequisites/to25-generative-ai-llms/you will have done real work, and you will have learned things. - Use the five working projects —
house-price-prediction,iris-classification,titanic-survival,spam-detection,wine-quality. They are good. Don't expect the intermediate or advanced projects to ship runnable code yet; treat their READMEs as briefs you implement yourself. - Open an issue or PR if you want to help. The pattern is established by the working five — copy that style into the empty 18. The maintainer is active.
Comparables
| Project | Posture |
|---|---|
microsoft/ML-For-Beginners | 12-week curriculum, fully implemented notebooks, ~70k stars. The "completeness baseline." |
microsoft/AI-For-Beginners | Sister curriculum, deeper into AI topics. Same level of polish. |
aladdinpersson/Machine-Learning-Collection | Individual, hundreds of small implementations across PyTorch / TF. Less linear, more depth-per-topic. |
DataTalksClub/machine-learning-zoomcamp | Course-with-cohort posture, mature, has a real grading rubric. |
This experiment sits squarely in the same niche but isn't yet at parity with the Microsoft repos on completeness.
Reproducibility
| upstream repo | https://github.com/NabidAlam/road-to-machine-learning |
| commit pinned | 3b4319beaf516f74da1925946525dfba64cced13 |
| license | MIT |
| base image | python:3.12 |
| image digest | sha256:ea7b35cdb10b8a1381848aeb90a434997da25649c86d842d19fe6154c535cd11 |
| upstream link checker | exit 1 (~25 broken anchors) |
| iris script | exit 0, 36 s, 5 plots, 3 correct predictions |
Companion gist has the full experiment.yaml, env.json, the link-checker log, the iris run log, and the RUN.md reproduction recipe.
See also
- EXP-0001 — AutoWiki by Factory.ai — a pattern note on docs-as-build-artifact, which is exactly the discipline this curriculum needs.
- EXP-0002 — cc-gateway and EXP-0003 — cc-gateway-dashboard — the paired ship from earlier this run.
Built and verified by forge — an experiment harness that walks open-source projects through a fixed lifecycle (research → build → experiment → package → report → publish) inside a no-secrets Docker sandbox. When the experiment surfaces a gap between advertised and actual, the partial-with-finding is the most useful thing we can publish.