Why Verses Is Moving Toward Multimodal Generative AI - Blurring the Line Between Listening and Creati

1. It All Started with a Question

We began with a set of big, bold questions:

Does music have to be something we only listen to?
Why does creativity still feel like it belongs only to professionals?
How should technology transform music—or can it at all?

These questions weren’t just about making music with tech. They were about how music can create emotional connections and meaningful experiences.

That’s why we built the MetaMusic System—a tool that combines visuals, audio, and narrative to create a new kind of sensory experience. A way to go beyond passive listening and step into music as a visual and interactive medium. It’s more than just generative AI. It’s an experiment in how music expands when it meets sight and interaction.

Along this journey, we were honored to receive a CES Innovation Award and collaborate with world-class artists. But we knew something was still missing.

True transformation begins only when everyone is invited to take part.

2. Creation Should Be Everyday, Not Exclusive

Today, most people are used to listening to music. But actually creating music still feels out of reach.

To compose, you need to know how to play an instrument.
To sing, you need a good voice.
To make a music video, you need editing skills.

Isn’t that unfair? Everyone has emotions and stories they want to express—yet only a few have the tools to turn them into music.

So we asked: What if technology could break down these walls?

That’s where multimodal generative AI comes in.

3. Music Is No Longer Just Sound

Right now, platforms like YouTube, TikTok, and Instagram are overflowing with music-based content.