Make Real
Make Real
Mahbub Rahman
Mahbub Rahman
Available for new projects

AI Audio Transcription & Podcast Tool Developer

Turn messy audio into structured intelligence.

View My Work

EXECUTIVE SUMMARY

Mahbub Rahman builds advanced AI audio transcription tools using Whisper, Deepgram, and OpenAI to convert raw audio, podcasts, and meeting recordings into structured, searchable data.

The Technical Reality

Transcribing audio is a solved problem. The real engineering challenge is speaker diarization, chunking hours of transcripts into context-aware LLM prompts, and processing large media files asynchronously. I architect media processing pipelines using S3 presigned URLs, webhooks, and background queues (BullMQ/Trigger.dev) so your server never crashes while a 2-hour podcast is being processed.

WHY FOUNDERS COME TO ME

Files are too large.You already know this.
THE TIMEOUT

Serverless functions are dying.

You can't hold a standard HTTP request open for 5 minutes while Whisper processes a file. You need a robust asynchronous architecture with storage buckets and webhooks.

Async processing queues
THE FORMAT

A wall of text is useless.

Users don't want to read a 10,000-word block of text. They need speaker diarization (Speaker A, Speaker B), timestamps, and LLM-generated summaries and action items.

Structured Diarization
THE COST

OpenAI Whisper is getting expensive.

Sending hundreds of hours of audio to OpenAI adds up fast. You need an architecture that supports faster, cheaper alternatives like Deepgram or local Whisper models when appropriate.

Cost-optimized routing

WHAT I BUILD WITH

Built for heavy lifting.No hand-offs required.

From database to deployment. I own the whole thing.

AUDIO APIs
Deepgram
OpenAI Whisper
AssemblyAI
STORAGE
AWS S3 / Cloudflare R2
Presigned URLs
QUEUES
Trigger.dev
BullMQ
Redis
BACKEND
Next.js
Node.js Webhooks
PostgreSQL

HOW IT WORKS

From upload to insight.

We build a pipeline that guarantees delivery without locking up your UI.

01

Direct-to-Cloud Uploads

Bypass the server

We implement presigned URLs so users upload massive audio files directly from their browser to S3/R2, bypassing your Next.js server entirely to prevent memory crashes.

02

Async Processing

The queue

Once uploaded, a background job is triggered. It sends the file to Deepgram or Whisper, polls for completion, and receives the webhook without blocking the user.

03

LLM Post-Processing

The intelligence

We pass the raw transcript through an LLM with strict prompts to generate show notes, identify speakers, extract quotes, and save the structured JSON to the database.

COMMON QUESTIONS

Questions aboutalways ask me.

Handling large files requires a different architecture.

Next.js serverless functions (like on Vercel) have strict timeout limits (often 10-60 seconds). Transcribing a podcast takes minutes. I solve this by moving the work to an asynchronous queue (like Trigger.dev or a separate background worker) that isn't bound by HTTP request limits.

Yes. This is called 'Speaker Diarization'. We use specialized APIs (like Deepgram Nova 2) that analyze voice signatures to label 'Speaker 1', 'Speaker 2', etc., and attach timestamps to each utterance.

A 3-hour podcast can easily exceed the context window of cheaper models. We implement chunking logic—breaking the transcript into overlapping segments, summarizing each segment, and then asking the LLM to write a final summary based on the intermediate summaries.

READY?

Let's buildsomething real.

30 minutes. No pitch. No pressure. Just an honest conversation about your project and whether I can actually help.

✓ Free 30-min call✓ No commitment✓ You'll know after 1 chat