Google’s enterprise cloud gets a music-generating AI model

Nikesh Vaishnav
Disclosure: This website may contain affiliate links, which means I may earn a commission if you click on the link and make a purchase. I only recommend products or services that I personally use and believe will add value to my readers. Your support is appreciated!

On Wednesday, Google rolled out updates to several of its first-party media-generating AI models available through its Vertex AI cloud platform.

Lyria, Google’s text-to-music model, is now available in preview for select customers, and the company’s Veo 2 video creation model has been enhanced with new editing and visual effects customization options. The company has also launched a voice-cloning feature powered by Chirp 3, Google’s audio understanding model, for “allow-listed” users. And the Imagen 3 image generator now delivers what the company describes as “significantly” better performance.

The updates, timed for Cloud Next, are Google’s latest push to corner the enterprise market for generative AI. The company competes perhaps most directly with Amazon, which offers a comparable cloud AI platform called Bedrock with its own set of proprietary generative AI models.

Google is pitching Lyria as an alternative to royalty-free music libraries. Using the model, customers can create songs in a range of styles and genres, from jazzy piano solos to lo-fi tracks, the company said.

Chirp 3, meanwhile, can synthesize speech in around 35 languages. First previewed earlier this year, Chirp 3 drives Instant Custom Voice, which can supposedly clone a voice with 10 seconds of audio. It’s now generally available. This model also underpins a new tool launching in preview, called Transcription with Diarization, which separates and identifies speakers in recordings with multiple participants.

To prevent abuse, Instant Custom Voice is subject to a “diligence” process to verify “proper voice usage permissions,” says Google.

As for Veo 2, the model can now remove background images, logos, and objects from existing videos, and extend the frame of video footage (to convert landscape video into portrait, for example). It can also now adjust the camera angles and pacing in AI-generated scenes to create timelapses, drone-style clips, and more, and it can interpolate between specified beginning and end frames.

These Veo features are available in preview for now.

As for the aforementioned Imagen 3 upgrades, Google said they improve the model’s ability to remove objects and reconstruct missing or damaged portions of images.

All media generated by Imagen, Veo, and Lyria (but not Chirp) are watermarked using Google’s SynthID technology. The company said all its generative AI models have “built-in safeguards” to protect against the creation of harmful content.

Google hasn’t historically indicated which specific data it uses to train its models, and the tech giant stuck with that precedent today. Training data tends to be a controversial subject for IP-related reasons. Some firms train their models on copyrighted works without first obtaining permission from rights holders. While these companies claim that U.S. fair use doctrine shields the practice, some creators understandably disagree. Many are battling vendors in court.

Google has previously told TechCrunch that it offers opt-out mechanisms for model training as well as an indemnity policy to shield Google Cloud and Vertex AI customers from AI-related copyright disputes.

Share This Article
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *