The science of brew.fm

The AI that
listens.

You’ve met the AI that writes. brew.fm runs on a different kind - one that never says a word. Seven ideas explain the whole machine, and you can press play on every single one.

Scroll

01Bigger than chatbots

LLMs are one animal.
AI is the zoo.

A language model does one thing: it reads everything ever written, then predicts the next word. That single trick, scaled up, is the chatbot revolution. But the deep idea underneath is bigger than words - turn messy human things into long lists of numbers, so that meaning becomes geometry.

Those lists are called embeddings. Language models embed words. Vision models embed images. brew.fm uses a model that embeds sound itself - no lyrics, no genre tags, no play counts. The only LLM in this whole product writes your alter ego a name. Everything else is done by ear.

02Why not just use genres?

Words fail at music.

Here are two songs Spotify files under the same label. Play them both.

Both shelved under “indie pop.” Play them. Your ears disagree.

“Indie pop” contains multitudes; “pop” means even less; half of what you love sits in genres you’d never click. Any system built on words inherits the lies words tell. So we don’t describe songs. We measure them.

03The measurement

The machine takes a song’s fingerprint.

A neural network trained on millions of clips listens to thirty seconds of audio - the texture, the tempo, the space between the notes - and writes down 512 numbers.

Those 512 numbers are the song’s fingerprint. Two songs that feel alike get numbers that are close. Two songs that feel like different planets get numbers that are far apart. Nobody tells the model what “alike” means - it learned by listening.

04Embeddings, visualized

Every song ever made is a point on a map.

Read the 512 numbers as coordinates and every song becomes a point in a 512-dimensional space of pure sound. Distance means similarity. That one move - sound becomes geometry - powers everything below.

these three sound alikethis one doesn’t

Tap the dots. Near means similar; far means different.

Your screen has two dimensions; the real map has 512. The math doesn’t care - the distance between two points works the same in 512 dimensions as it does in two.

05The centroid

A playlist has a heart.

Take every song in a playlist - every point - and average their coordinates. You get one new point: the centroid. It isn’t a song. It’s the sound the whole playlist orbits. We call it the heart.

the heartfitstoo far

Tap any dot to hear it. The four warm ones are the playlist. Green fits the heart; red doesn’t.

The heart is how a machine can hold a vibe without understanding a single word about it. Anything close to the heart belongs; anything far doesn’t - and you can hear that boundary with your own ears, above.

06Clustering

Your library is several people.

Plot everything you’ve ever saved and it doesn’t make one cloud. It pools into a few dense regions - because nobody is one person all the time. A clustering algorithm finds those regions with no labels and no hints, just density.

Each dense region is one of your alter egos. The sound is discovered by geometry; only then does a language model step in for its single job - giving the cluster a name worthy of it.

07The midpoint

A collab is a coordinate.

You are a cluster with a heart. The artist you’d kill to work with is another. Draw the line between the two hearts and walk to the exact middle. The songs already living there - close to you, close to them - are the playlist you’d make together.

Not “people who like X also like Y.” No taste graph, no popularity contest. Geometry: the literal midpoint between you and your favorite artist, in the space of sound itself.

08The living playlist

Then it stays alive.

Saving the collab isn’t the end - it’s a birth. Every week, new releases get fingerprinted and walked up to your playlist’s heart. Close enough gets in. Too far gets turned away. Your playlist grows a track at a time and never drifts off-vibe.

That’s the whole machine. No prompts, no feeds, no skips-per-minute engagement tricks. A map of sound, a heart, and a gate - working for your playlist while you sleep.

Now hear yours.

Meet your alter egos, pick your star, and we’ll find the exact midpoint between you.

For the technically curious: 512-dimensional audio embeddings (CLAP-family), cosine distance, centroids and density-based clustering over your listening history, served from a pgvector index over a growing catalog of fingerprinted tracks. The previews you played on this page are the same data the engine hears.

Bart Proost, who builds brew.fm

Built by one person who needed it to exist.

Hey, it’s Bart. I need fresh music, constantly - and no time to sift through releases by hand. Spotify’s recommendations never quite fit, and great drops from my favorite artists kept slipping past me. So I built the thing that listens for me. By day I’m an engineer at Anyscale, the company behind Ray - the open source framework for scaling AI, downloaded millions of times a week and running under OpenAI, Perplexity, Shopify, and Spotify itself. Embedding spaces, centroids, clusters: that’s the day job, pointed at the thing I actually love.

Say hi on X →