Sunil

Posted on Apr 16

I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation

#webdev #music #gamedev

"Disclosure: I'm building MusicWave, one of the tools mentioned below. This post is based on my actual workflow — I've named alternatives throughout, and the code here works with any of them."

The Problem Every Indie Developer Hits

Picture this: you've shipped the MVP, the UI is clean, the feature works. You drop it into a demo video or add ambient audio to your landing page and... crickets. Silence kills vibe. So you do what every dev does:

Open YouTube Audio Library → everything sounds like a 2014 vlog
Try Epidemic Sound or Artlist → $15-25/month just to try it
Find "royalty-free" music on Google → land on a 10-page licensing agreement you don't have time to parse
Give up and ship without music

I've been through this cycle on three side projects. For my latest one — a browser-based rhythm game built with Phaser — I needed 12 distinct tracks for different levels. Licensing that legitimately would have cost me $200+/month on Artlist or similar. So I tried a different route: generate everything with AI.

By the end of the weekend, I had 50+ royalty-free tracks, each tailored to a specific scene, level, or mood. This post is about what I learned, the code I wrote to automate it, and the gotchas.

Why AI Music Generation is Actually Useful for Developers Now
There's been a Cambrian explosion of AI music tools in the last 18 months:

Suno — best-in-class vocals, consumer-friendly
Udio — excellent genre variety
Stable Audio — Stability AI's offering, strong for instrumental
Meta's MusicGen — open-source, great for tinkering
Riffusion — novel spectrogram-based approach
Mubert — API-first for developers
MusicWave — the one I ended up using most (more on why below)

The outputs aren't SoundCloud-hit quality yet, but for:

Game/app background music
YouTube video soundtracks
Podcast intros
Demo reels
Landing page ambient audio

...they're more than good enough. And the killer feature for developers: most let you own commercial rights to what you generate on paid plans, so you don't need to relicense anything.

My Workflow
Here's what my weekend looked like, broken down into reproducible steps.

Step 1: Define the Scenes

I built a simple JSON spec for every track I needed. This turned into the single source of truth that drove the whole pipeline. (If you've read Kent C. Dodds on schema-driven development, this will feel familiar.)

{
  "tracks": [
    {
      "id": "menu_ambient",
      "prompt": "Soft ambient electronic menu music, minimalistic synths, calm and inviting, loopable",
      "duration": 60,
      "instrumental": true,
      "genre": "ambient"
    },
    {
      "id": "level_1_tutorial",
      "prompt": "Upbeat chiptune with 8-bit synths, cheerful and encouraging, 120 BPM",
      "duration": 90,
      "instrumental": true,
      "genre": "chiptune"
    },
    {
      "id": "boss_fight",
      "prompt": "Intense orchestral hybrid with driving drums, epic strings, and synth bass, 140 BPM, dramatic",
      "duration": 120,
      "instrumental": true,
      "genre": "epic"
    }
  ]
}

Lesson learned: specificity in prompts matters more than cleverness.

"Upbeat music" generates garbage. "Upbeat chiptune with 8-bit synths, cheerful and encouraging, 120 BPM" generates what you actually want.

This mirrors what OpenAI documented about prompt engineering — specificity always wins.

Step 2: Write a Prompt-Engineering Helper

I noticed most tools generate better results when the prompt includes:

Genre (specific, not generic)
Instrumentation (what instruments to feature)
Mood/Energy
Tempo in BPM (if you don't know what BPM fits your genre, this reference is useful)
Structural hints (intro, loop, drop, outro)

Here's a small TypeScript helper I wrote that enforces this structure:

interface TrackSpec {
  genre: string;
  instruments: string[];
  mood: string;
  bpm: number;
  structure?: 'loopable' | 'intro-buildup' | 'drop-focused' | 'cinematic';
}

function buildPrompt(spec: TrackSpec): string {
  const parts = [
    spec.genre,
    `featuring ${spec.instruments.join(', ')}`,
    `${spec.mood} energy`,
    `${spec.bpm} BPM`,
  ];

  if (spec.structure === 'loopable') {
    parts.push('seamless loop, no fade-in or fade-out');
  } else if (spec.structure === 'intro-buildup') {
    parts.push('starts sparse, builds intensity toward the end');
  } else if (spec.structure === 'drop-focused') {
    parts.push('builds tension then releases into a powerful drop at 30 seconds');
  } else if (spec.structure === 'cinematic') {
    parts.push('three-act structure with emotional climax');
  }

  return parts.join(', ');
}

// Usage
const bossTrack = buildPrompt({
  genre: 'epic orchestral hybrid',
  instruments: ['taiko drums', 'strings', 'synth bass', 'brass'],
  mood: 'intense',
  bpm: 140,
  structure: 'intro-buildup',
});

console.log(bossTrack);
// "epic orchestral hybrid, featuring taiko drums, strings, synth bass, brass, 
//  intense energy, 140 BPM, starts sparse, builds intensity toward the end"

This one abstraction saved me ~30% of my time because I stopped retrying generations with ambiguous prompts.

Step 3: Generate in Batches

I ended up going with MusicWave because it gave me two generations per request (good for picking the better take) and supported instrumental mode natively.

Suno would have worked too — honestly, if you need vocal-heavy tracks, Suno is still my first recommendation. For instrumental game music with specific BPM targets, MusicWave nailed it more consistently for me.

The API was straightforward — a POST with the prompt, poll for status, download when ready. This follows a standard async job pattern common in media generation APIs.

Rough pseudocode for what the loop looked like:

async function generateTracks(specs: TrackSpec[]) {
  const results = [];

  for (const spec of specs) {
    const prompt = buildPrompt(spec);
    console.log(`[${spec.id}] Generating: ${prompt}`);

    const job = await startGeneration({
      prompt,
      instrumental: true,
      duration: spec.duration,
    });

    const result = await pollUntilComplete(job.id, {
      timeoutMs: 180_000,
      intervalMs: 3000,
    });

    results.push({ id: spec.id, audioUrl: result.audioUrl });
    console.log(`[${spec.id}] Done → ${result.audioUrl}`);
  }

  return results;
}

Gotcha: generations take 1-3 minutes each, and most APIs rate-limit you to 1-2 concurrent jobs on free/starter plans.

Run overnight if you're generating 50+ tracks. Or batch-submit with retries — p-queue is my go-to library for this kind of concurrency control.

Step 4: Post-Processing for Loop Points

AI-generated music doesn't always loop cleanly. I wrote a quick FFmpeg wrapper to find zero-crossings and trim to a clean loop point. If you're new to FFmpeg, this guide is the canonical reference:

#!/bin/bash
# loop-trim.sh: Finds a clean loop point and trims the track

INPUT=$1
OUTPUT=${INPUT%.mp3}_looped.mp3

# Detect silences (often where a natural loop point exists)
SILENCE=$(ffmpeg -i "$INPUT" -af "silencedetect=noise=-30dB:d=0.5" -f null - 2>&1 \
  | grep "silence_end" | head -1 | awk '{print $5}')

if [ -z "$SILENCE" ]; then
  # No silence found — use FFmpeg's loop-friendly fade
  ffmpeg -i "$INPUT" -af "afade=t=out:st=55:d=2" -y "$OUTPUT"
else
  ffmpeg -i "$INPUT" -t "$SILENCE" -y "$OUTPUT"
fi

echo "Looped version saved to $OUTPUT"

For game music, I also separated stems (vocals/drums/bass/other) so I could dynamically mix layers based on game state — e.g., drums only during exploration, full mix during combat.

Most AI music platforms have stem-splitter endpoints, and there are great open-source options too like Spleeter by Deezer and Demucs by Meta if you want to run it locally.

Step 5: Integrate with Web Audio API

Here's the minimal code I ended up with for crossfading between tracks based on game state. If you haven't used it before, MDN's Web Audio API guide is an excellent starting point:

class SoundtrackManager {
  private ctx = new AudioContext();
  private tracks = new Map<string, AudioBuffer>();
  private currentSource: AudioBufferSourceNode | null = null;
  private currentGain: GainNode | null = null;

  async load(id: string, url: string) {
    const res = await fetch(url);
    const arrayBuffer = await res.arrayBuffer();
    const audioBuffer = await this.ctx.decodeAudioData(arrayBuffer);
    this.tracks.set(id, audioBuffer);
  }

  async crossfadeTo(id: string, durationMs = 2000) {
    const buffer = this.tracks.get(id);
    if (!buffer) throw new Error(`Track ${id} not loaded`);

    const source = this.ctx.createBufferSource();
    source.buffer = buffer;
    source.loop = true;

    const gain = this.ctx.createGain();
    gain.gain.value = 0;
    source.connect(gain).connect(this.ctx.destination);
    source.start();

    // Fade in new track
    gain.gain.linearRampToValueAtTime(
      1,
      this.ctx.currentTime + durationMs / 1000
    );

    // Fade out old track
    if (this.currentGain) {
      this.currentGain.gain.linearRampToValueAtTime(
        0,
        this.ctx.currentTime + durationMs / 1000
      );
      const oldSource = this.currentSource;
      setTimeout(() => oldSource?.stop(), durationMs);
    }

    this.currentSource = source;
    this.currentGain = gain;
  }
}

// Usage in my game
const soundtrack = new SoundtrackManager();
await soundtrack.load('menu', '/audio/menu_ambient.mp3');
await soundtrack.load('level1', '/audio/level_1_tutorial.mp3');

// On game state change:
soundtrack.crossfadeTo('level1', 3000);

If you want a higher-level abstraction, Howler.js wraps the Web Audio API beautifully and handles mobile autoplay quirks. Tone.js is another fantastic option if you're doing anything more musical (sequencing, instruments, effects).

What Actually Worked & What Didn't

Worked

Instrumental-only mode for game/app background. Vocals are distracting unless you want them front-and-center.
Generating 2 variations per prompt and picking the better one. Even the same prompt can produce wildly different takes (this is basically temperature-based sampling in disguise).
Iterating on prompts, not on the output. Don't try to "fix" a generation — regenerate with a tighter prompt.
Being specific about BPM and key. Big improvement in coherence.

Didn't Work

Trying to generate lyrics for specific narrative points. AI lyrics are still generic; write your own and use "Add Vocals" features. ChatGPT or Claude will draft better lyrics than most music-specific models.
Relying on one tool. Different models excel at different things — Suno for vocals, MusicGen for ambient, Minimax for structure-heavy tracks. I rotated between them.
Skipping post-processing. Even good AI music needs mastering, loop-point trimming, and volume normalization before it's production-ready. If you want to go deep on audio mastering, iZotope's blog is a goldmine.

The Licensing Question (Important)
If you're going to ship this stuff commercially, read the license. Most AI music tools require a paid plan for commercial rights:

1. Free tier usually = personal use only
2. Paid tier = commercial license (usually non-exclusive, royalty-free, worldwide)
3. Some platforms issue downloadable license certificates per song for legal peace of mind

I'm on MusicWave's Pro plan which includes a downloadable license PDF per track — useful when a platform like YouTube flags the audio and you need to submit proof via YouTube's copyright dispute form.

Double-check the specific tool's terms before you publish anything on a monetized channel. The EFF has a solid overview of creator-side IP issues if you want to go deeper.

TL;DR — What I'd Tell Past-Me

Stop paying for stock music libraries if you're a solo dev. Generate once, own forever.
Build a prompt helper function. It's the single highest-leverage abstraction.
Generate 2-3x what you need and curate. Takes 5 minutes to pick, saves headaches later.
Automate the boring parts: loop-trimming, volume normalization, file naming.
Keep a license folder alongside every track you publish commercially.

Resources & Tools I Used
AI Music Generators

MusicWave.ai — my main generator (multiple models, stem splitter, mastering built-in)
Suno — best for vocal-heavy tracks
Udio — great genre variety
Stable Audio — strong instrumental output
MusicGen (Meta) — open source, run locally

Audio Processing

FFmpeg — the swiss army knife of media processing
Audacity — free desktop audio editor
Spleeter — open-source stem separation
Demucs — Meta's stem separation model

Web Audio

MDN Web Audio API docs
Howler.js — simple audio library for the web
Tone.js — framework for creating interactive music in the browser
Wavesurfer.js — if you need waveform visualizations

Game Dev Adjacent

Phaser — HTML5 game framework (what I used)
Godot — open-source game engine with great audio support
FMOD — pro-level adaptive audio middleware (free for indies under a revenue threshold)

Further Reading

Wwise's guide to adaptive audio
Designing Sound by Andy Farnell — the bible of procedural audio
Game Audio Implementation by Richard Stevens

Drop your workflow in the comments. Happy to share more of my pipeline if folks are interested — I might do a Part 2 on dynamic layered soundtracks (music that reacts to game state).