Around this time, two years ago, I was learning how to prompt in OpenAI’s Playground using GPT-3. Amazingly, I could produce an output that could synthesize the plots from three films and I thought that was impressive. That was in 2022 when generative AI was limited to large language models. Now, anyone can prompt a music generator to produce a song that sounds like the real thing. While this technology may seem like a fun novelty, it raises profound questions about the future of human creativity, artistry, and our very relationship with music itself.
I fed the above paragraph into Suno with a mix of musical styles and it produced the following track in seconds. It represents a single shot generation so it is far from perfect, but I think the implications are clear that we’re in very new territory regarding the capabilities of these evolving systems.
We've witnessed text generators' impact on writing and image generators' implications for artists, and now musicians will have to contend with audio generators. The question educators have asked about generative AI remains: How do we navigate a technology that can potentially help so many people unleash new personal discoveries without stifling the very human creativity we've invested so much into cultivating?
It’s hard for me to wrap my head around multimodal AI and judge its implications for learning, but I think I agree with JPMorgan’s CEO Jamie Dimon when he recently said that AI could be one of the most transformational technologies of our lifetime. Dimon is eager to explore AI’s profit potential, while I’m increasingly convinced this technology will be transformative only for his economic class and deeply disruptive for the rest of us.
Generating Music With Udio and Suno
Let’s spend some time looking at how far generative music has come, and maybe you can tell me I’m overreacting here and that AI hasn’t imperiled another human skill. Udio is a new platform that uses multimodal AI to generate full songs. It joins Suno.ai as another impressive text-to-music AI generator. Many of the outputs people are creating with Udio are hilarious, and I’ve included several examples below, but the implications for the music industry and maybe learning are anything but. People are using Udio and Suno to create entire songs that sound as good as anything on the radio. Listen to just a few of the examples below that are trending on their platform.
The Sound and Fury of The Lawsuits to Come
Like Suno, Udio likely used copyrighted material within its training data. I’m sure we’re going to see a flurry of lawsuits, and these will drag on in the courts as one side argues theft and the other side argues fair use because it is transformative. The discourse about suing AI companies for training material misses the main point—AI companies aren’t going to stop scraping content. They may have to pay licensing fees to publishers, but artists will not have a say in how their content is used. Millions of people will generate songs with this technology and have their music played across TikTok on endless reels of aggregated content.
Multimodal AI threatens to disrupt our relationship with music, but not in the same way text generation did to the intimacy of the written word. After all, modern music has been inundated with machine-learning audio software since Cher pioneered Auto-Tune in 1999 for her album Believe. Can you recall a single artist who hasn’t used the software to improve their vocals since?
The Value and Struggle of Learning is at Stake
I take my daughter to music lessons each week, where she learns to play piano and acoustic guitar, and the way her eyes widened when she saw how easy it was to create not just a song with Udio, but her song, was truly something to behold.
It was also devastating.
We’re on the cusp of one of the greatest technological shifts in human history, and so many people are gleefully embracing generative AI without pausing to consider what doing so means for how we learn and why we invest so much time, energy, and resources into human beings. The arrival of music generators closes the door on the possibilities of yet another skill that takes humans years of tedious practice and struggle to learn. As I looked at her gleefully dictating lyrics and picking musical styles, I wondered how long it would be until she asked what the point was for the music lessons when she could create any song she liked using AI.
I’ll have something rehearsed ready to tell her about how creating art yourself is one of the most gratifying experiences we have and that learning to play music matters even if you never do it in a band or become a rock star. I’ll tell her tradition matters, how proud I am of her, and that each person has dignity and deserves the opportunity to learn something, even if it ultimately means you fail at it. I’ll say that, even though it is becoming increasingly hard for me to believe it.
What Will We Value in this AI Era?
Of course, generative music could help people who do not have the skills or resources to create music, but that’s not the argument. People will still create music, learn to play instruments, and pay ridiculous amounts of money to see the next Taylor Swift or Morgan Wallen live. AI won’t destroy that completely. But real music will become a boutique skill, one reserved for a certain class of people who can afford to value it. What district is going to fund a struggling music program in a poorly resourced school when kids can spend 20 minutes on their iPads generating music? Do we believe a child exposed to a music generator will be inspired to go out and learn how to create music on their own?
We hear the narrative pushed by many of these companies that AI will democratize access to skills and that such systems will lead to human flourishing, but we shouldn’t be fooled by such rhetorical framing marketing this technology as a source for freedom, choice, or plurality. The very factors that shape our modern social and economic society ultimately devalue the messy complexity and imperfections of learned skills and push the adoption of time-saving technologies like AI because the latter offers the least friction, cost, or knowledge to use. Many will argue that AI is cheap substitute for real music, but in the end it is a down right bargain when compared to the resources it takes for a human being to learn how to play music.
The great fear I have isn’t robot overloads—it’s the uncritical adoption of generative AI to the point where we allow the technology to further dehumanize and devalue the imperfect skills we invest our dreams in. As mesmerizing as generative music is, I can’t help but think this will destroy so many of those possibilities. We risk creating a two-tiered world — one where art created through human labor, creativity, and struggle is transformed into a luxury item, accessible only to those with the resources, time, and money to learn. While the other tier, the one open to the mainstream, is flooded with the equivalent of artistic fast food — plentiful, convenient, and lacking in true substance or skills that make up the human experience.
And this will absolutely impact the already dwindling number of majors and career paths institutions of higher learning offer.
So much to chew on here, as usual. I think you zero in on the ultimate crux of this technology, which is it's a challenge to figuring out what we value, and it's possible that some of us will not like those answers, or that only certain groups will have the privilege of doing work that reflects their values.
Those songs, like the recently released Suno videos are simultaneously amazing and also soulless and shitty. Of course, there's lots of popular songs that are soulless and shitty, so it's not like soulless and shitty are barriers to market success.
As with AI writing we've also been prepped for accepting soulless and shitty by the steady templatization of music, both in terms of production (using ProTools, which tunes every voice and keeps tempo rock steady), and form (there's hardly any bridges in popular music today). Maybe we've been conditioned to accept something that is just not good and we've lost touch with what's meaningful.
There's something very satisfying about learning an instrument in terms of advancing one's individual mastery and appreciating that progress, but it's possible none of that will "matter" in the broad scheme of producing music for public consumption.
Seems like this is all true for writing as well.
Consumer AI apps have a really high turnover Generative AI is demonstrating. For every Stability or Midjourney, a Leonardo or Ideogram magically shows up to do what they can do more or less for cheaper or better. Hard to take AI tools or app seriously in such an environment of music chairs tbh.
As impressive as Suno or Udio might be, I think for ChatGPT or Sora, it's only an invitation for more apps like them to come and go.