feat(audio): Add waveform visualization for PTT voice messages #2345

ffigueroa · 2026-01-01T19:29:18Z

Summary

This PR adds proper waveform visualization for PTT (Push-to-Talk) voice messages sent via the API. Currently, audio messages sent through Evolution API display without the visual waveform in WhatsApp, making them look less authentic compared to messages sent directly from the app.

Changes

Waveform generation: Uses audio-decode library to analyze audio buffer and generate a 64-value waveform array representing the audio amplitude
Duration extraction: Automatically extracts audio duration from the buffer
Bitrate adjustment: Changed audio bitrate from 128k to 48k as per WhatsApp PTT requirements
Baileys patch: Prevents Baileys from overwriting manually-generated waveforms (via patch-package)

Technical Details

getAudioDuration(): Extracts duration in seconds from audio buffer
getAudioWaveform(): Generates normalized waveform (0-100 range) with 64 sample points
Waveform values are properly typed as Uint8Array for Baileys compatibility
Includes fallback handling if waveform generation fails

Related Issues

Fixes Audio waveform missing when sending voice notes via Evolution API, impacting user experience and perception. #1086 (Audio waveform missing when sending voice notes via Evolution API)
Related to Baileys issue Voice-note waves missing in 6.7.18 – waveform is dropped WhiskeySockets/Baileys#1587

Testing

Tested with various audio formats (mp3, ogg, wav)
Verified waveform displays correctly in WhatsApp iOS and Android
Confirmed backwards compatibility with existing audio sending functionality

Screenshots

Voice messages now display with proper waveform visualization instead of a flat line.

Summary by Sourcery

Add waveform-enabled PTT audio sending for WhatsApp and wire up Baileys patching in builds.

New Features:

Generate and attach audio duration and 64-sample waveform metadata for PTT voice messages sent via WhatsApp.

Bug Fixes:

Ensure PTT voice messages sent via the API display a proper waveform in WhatsApp instead of a flat line.

Enhancements:

Adjust WhatsApp PTT audio encoding to 48k bitrate to better match WhatsApp voice note requirements.
Add logging around audio duration and waveform generation to aid debugging and observability of PTT messages.

Build:

Include patch files in the build context and run patch-package during Docker build and npm postinstall to patch Baileys behavior.
Increase Node.js memory limit during the build step to improve build reliability.

sourcery-ai · 2026-01-01T19:29:28Z

Reviewer's Guide

Implements waveform-enabled PTT audio sending for WhatsApp by decoding audio buffers to derive duration and a 64-point Uint8Array waveform, wiring these into Baileys message content, adjusting audio bitrate to meet WhatsApp PTT expectations, and introducing a patch-package based Baileys patch plus Docker/postinstall wiring to apply it in all environments.

Sequence diagram for sending PTT audio with generated waveform

sequenceDiagram
  actor Client
  participant EvolutionAPI
  participant BaileysStartupService
  participant FFmpeg_processAudio
  participant audioDecode_duration
  participant audioDecode_waveform
  participant Baileys_sendMessageWithTyping
  participant WhatsApp

  Client->>EvolutionAPI: POST /audioWhatsapp (SendAudioDto)
  EvolutionAPI->>BaileysStartupService: audioWhatsapp(data, file, isIntegration)

  BaileysStartupService->>FFmpeg_processAudio: processAudio(mediaData.audio)
  FFmpeg_processAudio-->>BaileysStartupService: Buffer convert (48k bitrate)

  alt Converted_audio_is_Buffer
    BaileysStartupService->>audioDecode_duration: getAudioDuration(convert)
    audioDecode_duration-->>BaileysStartupService: seconds

    BaileysStartupService->>audioDecode_waveform: getAudioWaveform(convert)
    audioDecode_waveform-->>BaileysStartupService: Uint8Array waveform (64 values)

    BaileysStartupService->>Baileys_sendMessageWithTyping: sendMessageWithTyping(number, messageContent_with_waveform)
  else Raw_or_URL_audio
    BaileysStartupService->>BaileysStartupService: Derive audioBuffer (URL or base64 Buffer)
    alt audioBuffer_is_Buffer
      BaileysStartupService->>audioDecode_duration: getAudioDuration(audioBuffer)
      audioDecode_duration-->>BaileysStartupService: seconds

      BaileysStartupService->>audioDecode_waveform: getAudioWaveform(audioBuffer)
      audioDecode_waveform-->>BaileysStartupService: Uint8Array waveform (64 values)

      BaileysStartupService->>Baileys_sendMessageWithTyping: sendMessageWithTyping(number, audioBuffer_with_waveform)
    else audioBuffer_is_URL
      BaileysStartupService->>Baileys_sendMessageWithTyping: sendMessageWithTyping(number, URL_audio_without_waveform)
    end
  end

  Baileys_sendMessageWithTyping-->>WhatsApp: PTT message with seconds and waveform
  WhatsApp-->>Client: Voice message UI with waveform visualization

Updated class diagram for BaileysStartupService audio waveform support

classDiagram
  class BaileysStartupService {
    +audioWhatsapp(data SendAudioDto, file any, isIntegration boolean) Promise~any~
    -getAudioDuration(audioBuffer Buffer) Promise~number~
    -getAudioWaveform(audioBuffer Buffer) Promise~Uint8Array~
  }

  class SendAudioDto {
    +number string
    +audio any
    +delay number
  }

  class AudioDecoder {
    +decode(audioBuffer Buffer) AudioData
  }

  class AudioData {
    +duration number
    +getChannelData(channelIndex number) Float32Array
  }

  BaileysStartupService --> SendAudioDto : uses
  BaileysStartupService --> AudioDecoder : uses audioDecode
  AudioDecoder --> AudioData : returns
  BaileysStartupService ..> Uint8Array : generates waveform
  BaileysStartupService ..> Buffer : processes audio buffers

File-Level Changes

Change	Details	Files
Add audio duration and waveform extraction utilities and integrate them into WhatsApp PTT sending flow for both pre-processed buffers and base64/URL inputs.	Introduce private getAudioDuration that decodes the audio buffer via audioDecode, returns a ceil’d duration in seconds, and falls back to 1 second on failure with logging. Introduce private getAudioWaveform that decodes channel data, downsamples to 64 buckets, normalizes to a 0–100 Uint8Array with minimum non-zero bar height and flat fallback, and logs debug information. Update audioWhatsapp to, when processAudio returns a Buffer, compute seconds and waveform and include them in the AnyMessageContent payload, logging waveform metadata for debugging. Extend the non-processAudio branch to generate seconds and waveform only when the audio is a Buffer (not URL), and conditionally spread them into the message content.	`src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts`
Adjust audio encoding to match WhatsApp PTT expectations and improve build stability in Docker.	Change ffmpeg audioBitrate from 128k to 48k in the PTT processing pipeline. Copy the patches directory into the Docker build context and run patch-package during image build so the Baileys patch is applied inside the container. Run the build in Docker with NODE_OPTIONS --max-old-space-size=2048 to reduce memory-related build failures.	`src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts` `Dockerfile`
Wire in patch-package for Baileys patch application in local and CI installs.	Add a postinstall script that runs patch-package so patches are applied on every npm install. Add patch-package as a devDependency to ensure availability in all environments. Introduce a patches/baileys+7.0.0-rc.6.patch file to prevent Baileys from overwriting manually generated waveform metadata.	`package.json` `package-lock.json` `patches/baileys+7.0.0-rc.6.patch`

Assessment against linked issues

Issue	Objective	Addressed	Explanation
#1086	Restore the WhatsApp-style audio waveform visualization for voice/PTT messages sent via the Evolution API so that recipients see a waveform instead of a plain/flat audio UI.	✅

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

Both getAudioDuration and getAudioWaveform call audioDecode on the same buffer; consider decoding once and passing the decoded data through to avoid redundant CPU-heavy work for each message.
In getAudioWaveform, samplesPerWaveform can become 0 for very short audio, which leads to a division by zero when computing avg; adding a guard to ensure a minimum of 1 sample per bucket would make this more robust.
The new waveform-related logging (info with first 10 values, type, etc.) runs on every audio send and may be quite noisy in production; you might want to downgrade some of these to a debug/verbose level or gate them behind a flag.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Both `getAudioDuration` and `getAudioWaveform` call `audioDecode` on the same buffer; consider decoding once and passing the decoded data through to avoid redundant CPU-heavy work for each message.
- In `getAudioWaveform`, `samplesPerWaveform` can become 0 for very short audio, which leads to a division by zero when computing `avg`; adding a guard to ensure a minimum of 1 sample per bucket would make this more robust.
- The new waveform-related logging (`info` with first 10 values, type, etc.) runs on every audio send and may be quite noisy in production; you might want to downgrade some of these to a debug/verbose level or gate them behind a flag.

## Individual Comments

### Comment 1
<location> `src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts:3061-3070` </location>
<code_context>
+      const audioData = await audioDecode(audioBuffer);
+      const samples = audioData.getChannelData(0); // Get first channel
+      const waveformLength = 64;
+      const samplesPerWaveform = Math.floor(samples.length / waveformLength);
+
+      // First pass: calculate raw averages
+      const rawValues: number[] = [];
+      for (let i = 0; i < waveformLength; i++) {
+        const start = i * samplesPerWaveform;
+        const end = start + samplesPerWaveform;
+        let sum = 0;
+        for (let j = start; j < end && j < samples.length; j++) {
+          sum += Math.abs(samples[j]);
+        }
+        const avg = sum / samplesPerWaveform;
+        rawValues.push(avg);
+      }
</code_context>

<issue_to_address>
**issue (bug_risk):** Guard against very short audio causing division by zero and incorrect indexing when computing the waveform.

When `samples.length < waveformLength`, `samplesPerWaveform` becomes 0, causing a division by zero (`avg = sum / samplesPerWaveform` → `NaN`) and leaving `start`/`end` stuck at 0. Handle this case explicitly (e.g., enforce `samplesPerWaveform >= 1`, reduce `waveformLength` for very short clips, or early‑return a simplified waveform) so short audio doesn’t break the loop or produce invalid values.
</issue_to_address>

### Comment 2
<location> `src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts:3045` </location>
<code_context>
+  private async getAudioDuration(audioBuffer: Buffer): Promise<number> {
+    try {
+      this.logger.info('Getting audio duration...');
+      const audioData = await audioDecode(audioBuffer);
+      const duration = Math.ceil(audioData.duration);
+      this.logger.info(`Audio duration: ${duration} seconds`);
</code_context>

<issue_to_address>
**suggestion (performance):** Avoid decoding the same audio buffer twice for duration and waveform to reduce overhead.

`getAudioDuration` and `getAudioWaveform` each call `audioDecode(audioBuffer)`, and `audioWhatsapp` runs them sequentially on the same buffer. Consider decoding once and reusing the decoded data (or adding a helper that does both) to avoid duplicate work and lower CPU usage under load.

Suggested implementation:

```typescript
  private async decodeAudio(audioBuffer: Buffer): Promise<AudioData | null> {
    try {
      this.logger.info('Decoding audio buffer...');
      const audioData = await audioDecode(audioBuffer);
      this.logger.info('Audio buffer decoded successfully');
      return audioData;
    } catch (error) {
      this.logger.warn(`Failed to decode audio buffer: ${error.message}`);
      return null;
    }
  }

  private async getAudioDuration(audioData: AudioData | null): Promise<number> {
    if (!audioData) {
      this.logger.warn('Audio data missing, using default 1 second duration');
      return 1;
    }

    try {
      const duration = Math.ceil(audioData.duration);
      this.logger.info(`Audio duration: ${duration} seconds`);
      return duration;
    } catch (error) {
      this.logger.warn(`Failed to get audio duration: ${error.message}, using default 1 second`);
      return 1;
    }
  }

  private async getAudioWaveform(audioData: AudioData | null): Promise<Uint8Array> {
    try {
      this.logger.info('Generating audio waveform...');

```

1. At the top of `src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts`, import the `AudioData` type from the same module that provides `audioDecode`, e.g.:
   - `import audioDecode, { AudioData } from 'audio-decode';`
   Adjust this to match how `audioDecode` is currently imported in your file.
2. Update `getAudioWaveform`’s implementation further down in the file to **stop calling `audioDecode`**. Instead, operate directly on the passed `audioData` parameter. Remove any `audioDecode(audioBuffer)` calls inside this method.
3. Anywhere `getAudioDuration` and `getAudioWaveform` are called (likely in your `audioWhatsapp` flow), change the usage to:
   - Decode once: `const audioData = await this.decodeAudio(audioBuffer);`
   - Then reuse: `const duration = await this.getAudioDuration(audioData);`
   - And: `const waveform = await this.getAudioWaveform(audioData);`
4. Remove any remaining direct calls to `audioDecode(audioBuffer)` in this class that are only used to derive duration or waveform, to ensure the buffer is decoded only once per processing flow.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-01-01T19:30:36Z

src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts

+      const samplesPerWaveform = Math.floor(samples.length / waveformLength);
+
+      // First pass: calculate raw averages
+      const rawValues: number[] = [];
+      for (let i = 0; i < waveformLength; i++) {
+        const start = i * samplesPerWaveform;
+        const end = start + samplesPerWaveform;
+        let sum = 0;
+        for (let j = start; j < end && j < samples.length; j++) {
+          sum += Math.abs(samples[j]);


issue (bug_risk): Guard against very short audio causing division by zero and incorrect indexing when computing the waveform.

When samples.length < waveformLength, samplesPerWaveform becomes 0, causing a division by zero (avg = sum / samplesPerWaveform → NaN) and leaving start/end stuck at 0. Handle this case explicitly (e.g., enforce samplesPerWaveform >= 1, reduce waveformLength for very short clips, or early‑return a simplified waveform) so short audio doesn’t break the loop or produce invalid values.

- Add audio-decode library for audio buffer analysis - Implement getAudioDuration() to extract duration from audio - Implement getAudioWaveform() to generate 64-value waveform array - Normalize waveform values to 0-100 range for WhatsApp compatibility - Change audio bitrate from 128k to 48k per WhatsApp PTT requirements - Add Baileys patch to prevent waveform overwrite - Increase Node.js heap size for build to prevent OOM Fixes EvolutionAPI#1086

ffigueroa · 2026-01-01T19:33:55Z

Closing to reopen with clean commit history

sourcery-ai bot reviewed Jan 1, 2026

View reviewed changes

ffigueroa force-pushed the feature/add-audio-waveforms branch from 4cd0a3b to fac3cff Compare January 1, 2026 19:32

ffigueroa closed this Jan 1, 2026

ffigueroa deleted the feature/add-audio-waveforms branch January 1, 2026 19:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(audio): Add waveform visualization for PTT voice messages #2345

feat(audio): Add waveform visualization for PTT voice messages #2345

ffigueroa commented Jan 1, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Jan 1, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Jan 1, 2026

Uh oh!

ffigueroa commented Jan 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat(audio): Add waveform visualization for PTT voice messages #2345

feat(audio): Add waveform visualization for PTT voice messages #2345

Conversation

ffigueroa commented Jan 1, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Technical Details

Related Issues

Testing

Screenshots

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Jan 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for sending PTT audio with generated waveform

Updated class diagram for BaileysStartupService audio waveform support

File-Level Changes

Assessment against linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

ffigueroa commented Jan 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ffigueroa commented Jan 1, 2026 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Jan 1, 2026 •

edited

Loading