Posts

Showing posts from May, 2026

Can Data Out-Taste a Human? What Clustering 6,000 Wines Taught Me About Machine Learning vs. Real-World Balance

Image
When we think of wine tasting, we imagine sommeliers swirling glasses, checking the legs, and hunting for notes of oak, fruit, or earth. It feels entirely subjective. But behind every bottle is a strict, unyielding blueprint of laboratory chemistry: pH thresholds, density readings, alcohol volume, and sulfite volumes. As a data analyst, this raised a fascinating question: Can an unsupervised machine learning model look purely at these raw numbers and reverse-engineer the hidden patterns of wine style and quality without any human guidance? To find out, I built a 4-step data mining pipeline in R using K-Means and Hierarchical Agglomerative Clustering on a dataset of over 6,000 wine chemical profiles. The results proved some textbook data theories—but they also delivered a glaring reality check about how we measure "quality." 1. The Invisible Trap: Why Scaling Dictates Model Success Before running a single model, data preparation is mandatory. If you look at raw wine metrics, t...