-
Notifications
You must be signed in to change notification settings - Fork 5.2k
feat(audio): Add waveform visualization for PTT voice messages #2345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's GuideImplements waveform-enabled PTT audio sending for WhatsApp by decoding audio buffers to derive duration and a 64-point Uint8Array waveform, wiring these into Baileys message content, adjusting audio bitrate to meet WhatsApp PTT expectations, and introducing a patch-package based Baileys patch plus Docker/postinstall wiring to apply it in all environments. Sequence diagram for sending PTT audio with generated waveformsequenceDiagram
actor Client
participant EvolutionAPI
participant BaileysStartupService
participant FFmpeg_processAudio
participant audioDecode_duration
participant audioDecode_waveform
participant Baileys_sendMessageWithTyping
participant WhatsApp
Client->>EvolutionAPI: POST /audioWhatsapp (SendAudioDto)
EvolutionAPI->>BaileysStartupService: audioWhatsapp(data, file, isIntegration)
BaileysStartupService->>FFmpeg_processAudio: processAudio(mediaData.audio)
FFmpeg_processAudio-->>BaileysStartupService: Buffer convert (48k bitrate)
alt Converted_audio_is_Buffer
BaileysStartupService->>audioDecode_duration: getAudioDuration(convert)
audioDecode_duration-->>BaileysStartupService: seconds
BaileysStartupService->>audioDecode_waveform: getAudioWaveform(convert)
audioDecode_waveform-->>BaileysStartupService: Uint8Array waveform (64 values)
BaileysStartupService->>Baileys_sendMessageWithTyping: sendMessageWithTyping(number, messageContent_with_waveform)
else Raw_or_URL_audio
BaileysStartupService->>BaileysStartupService: Derive audioBuffer (URL or base64 Buffer)
alt audioBuffer_is_Buffer
BaileysStartupService->>audioDecode_duration: getAudioDuration(audioBuffer)
audioDecode_duration-->>BaileysStartupService: seconds
BaileysStartupService->>audioDecode_waveform: getAudioWaveform(audioBuffer)
audioDecode_waveform-->>BaileysStartupService: Uint8Array waveform (64 values)
BaileysStartupService->>Baileys_sendMessageWithTyping: sendMessageWithTyping(number, audioBuffer_with_waveform)
else audioBuffer_is_URL
BaileysStartupService->>Baileys_sendMessageWithTyping: sendMessageWithTyping(number, URL_audio_without_waveform)
end
end
Baileys_sendMessageWithTyping-->>WhatsApp: PTT message with seconds and waveform
WhatsApp-->>Client: Voice message UI with waveform visualization
Updated class diagram for BaileysStartupService audio waveform supportclassDiagram
class BaileysStartupService {
+audioWhatsapp(data SendAudioDto, file any, isIntegration boolean) Promise~any~
-getAudioDuration(audioBuffer Buffer) Promise~number~
-getAudioWaveform(audioBuffer Buffer) Promise~Uint8Array~
}
class SendAudioDto {
+number string
+audio any
+delay number
}
class AudioDecoder {
+decode(audioBuffer Buffer) AudioData
}
class AudioData {
+duration number
+getChannelData(channelIndex number) Float32Array
}
BaileysStartupService --> SendAudioDto : uses
BaileysStartupService --> AudioDecoder : uses audioDecode
AudioDecoder --> AudioData : returns
BaileysStartupService ..> Uint8Array : generates waveform
BaileysStartupService ..> Buffer : processes audio buffers
File-Level Changes
Assessment against linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 2 issues, and left some high level feedback:
- Both
getAudioDurationandgetAudioWaveformcallaudioDecodeon the same buffer; consider decoding once and passing the decoded data through to avoid redundant CPU-heavy work for each message. - In
getAudioWaveform,samplesPerWaveformcan become 0 for very short audio, which leads to a division by zero when computingavg; adding a guard to ensure a minimum of 1 sample per bucket would make this more robust. - The new waveform-related logging (
infowith first 10 values, type, etc.) runs on every audio send and may be quite noisy in production; you might want to downgrade some of these to a debug/verbose level or gate them behind a flag.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Both `getAudioDuration` and `getAudioWaveform` call `audioDecode` on the same buffer; consider decoding once and passing the decoded data through to avoid redundant CPU-heavy work for each message.
- In `getAudioWaveform`, `samplesPerWaveform` can become 0 for very short audio, which leads to a division by zero when computing `avg`; adding a guard to ensure a minimum of 1 sample per bucket would make this more robust.
- The new waveform-related logging (`info` with first 10 values, type, etc.) runs on every audio send and may be quite noisy in production; you might want to downgrade some of these to a debug/verbose level or gate them behind a flag.
## Individual Comments
### Comment 1
<location> `src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts:3061-3070` </location>
<code_context>
+ const audioData = await audioDecode(audioBuffer);
+ const samples = audioData.getChannelData(0); // Get first channel
+ const waveformLength = 64;
+ const samplesPerWaveform = Math.floor(samples.length / waveformLength);
+
+ // First pass: calculate raw averages
+ const rawValues: number[] = [];
+ for (let i = 0; i < waveformLength; i++) {
+ const start = i * samplesPerWaveform;
+ const end = start + samplesPerWaveform;
+ let sum = 0;
+ for (let j = start; j < end && j < samples.length; j++) {
+ sum += Math.abs(samples[j]);
+ }
+ const avg = sum / samplesPerWaveform;
+ rawValues.push(avg);
+ }
</code_context>
<issue_to_address>
**issue (bug_risk):** Guard against very short audio causing division by zero and incorrect indexing when computing the waveform.
When `samples.length < waveformLength`, `samplesPerWaveform` becomes 0, causing a division by zero (`avg = sum / samplesPerWaveform` → `NaN`) and leaving `start`/`end` stuck at 0. Handle this case explicitly (e.g., enforce `samplesPerWaveform >= 1`, reduce `waveformLength` for very short clips, or early‑return a simplified waveform) so short audio doesn’t break the loop or produce invalid values.
</issue_to_address>
### Comment 2
<location> `src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts:3045` </location>
<code_context>
+ private async getAudioDuration(audioBuffer: Buffer): Promise<number> {
+ try {
+ this.logger.info('Getting audio duration...');
+ const audioData = await audioDecode(audioBuffer);
+ const duration = Math.ceil(audioData.duration);
+ this.logger.info(`Audio duration: ${duration} seconds`);
</code_context>
<issue_to_address>
**suggestion (performance):** Avoid decoding the same audio buffer twice for duration and waveform to reduce overhead.
`getAudioDuration` and `getAudioWaveform` each call `audioDecode(audioBuffer)`, and `audioWhatsapp` runs them sequentially on the same buffer. Consider decoding once and reusing the decoded data (or adding a helper that does both) to avoid duplicate work and lower CPU usage under load.
Suggested implementation:
```typescript
private async decodeAudio(audioBuffer: Buffer): Promise<AudioData | null> {
try {
this.logger.info('Decoding audio buffer...');
const audioData = await audioDecode(audioBuffer);
this.logger.info('Audio buffer decoded successfully');
return audioData;
} catch (error) {
this.logger.warn(`Failed to decode audio buffer: ${error.message}`);
return null;
}
}
private async getAudioDuration(audioData: AudioData | null): Promise<number> {
if (!audioData) {
this.logger.warn('Audio data missing, using default 1 second duration');
return 1;
}
try {
const duration = Math.ceil(audioData.duration);
this.logger.info(`Audio duration: ${duration} seconds`);
return duration;
} catch (error) {
this.logger.warn(`Failed to get audio duration: ${error.message}, using default 1 second`);
return 1;
}
}
private async getAudioWaveform(audioData: AudioData | null): Promise<Uint8Array> {
try {
this.logger.info('Generating audio waveform...');
```
1. At the top of `src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts`, import the `AudioData` type from the same module that provides `audioDecode`, e.g.:
- `import audioDecode, { AudioData } from 'audio-decode';`
Adjust this to match how `audioDecode` is currently imported in your file.
2. Update `getAudioWaveform`’s implementation further down in the file to **stop calling `audioDecode`**. Instead, operate directly on the passed `audioData` parameter. Remove any `audioDecode(audioBuffer)` calls inside this method.
3. Anywhere `getAudioDuration` and `getAudioWaveform` are called (likely in your `audioWhatsapp` flow), change the usage to:
- Decode once: `const audioData = await this.decodeAudio(audioBuffer);`
- Then reuse: `const duration = await this.getAudioDuration(audioData);`
- And: `const waveform = await this.getAudioWaveform(audioData);`
4. Remove any remaining direct calls to `audioDecode(audioBuffer)` in this class that are only used to derive duration or waveform, to ensure the buffer is decoded only once per processing flow.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| const samplesPerWaveform = Math.floor(samples.length / waveformLength); | ||
|
|
||
| // First pass: calculate raw averages | ||
| const rawValues: number[] = []; | ||
| for (let i = 0; i < waveformLength; i++) { | ||
| const start = i * samplesPerWaveform; | ||
| const end = start + samplesPerWaveform; | ||
| let sum = 0; | ||
| for (let j = start; j < end && j < samples.length; j++) { | ||
| sum += Math.abs(samples[j]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Guard against very short audio causing division by zero and incorrect indexing when computing the waveform.
When samples.length < waveformLength, samplesPerWaveform becomes 0, causing a division by zero (avg = sum / samplesPerWaveform → NaN) and leaving start/end stuck at 0. Handle this case explicitly (e.g., enforce samplesPerWaveform >= 1, reduce waveformLength for very short clips, or early‑return a simplified waveform) so short audio doesn’t break the loop or produce invalid values.
- Add audio-decode library for audio buffer analysis - Implement getAudioDuration() to extract duration from audio - Implement getAudioWaveform() to generate 64-value waveform array - Normalize waveform values to 0-100 range for WhatsApp compatibility - Change audio bitrate from 128k to 48k per WhatsApp PTT requirements - Add Baileys patch to prevent waveform overwrite - Increase Node.js heap size for build to prevent OOM Fixes EvolutionAPI#1086
4cd0a3b to
fac3cff
Compare
|
Closing to reopen with clean commit history |
Summary
This PR adds proper waveform visualization for PTT (Push-to-Talk) voice messages sent via the API. Currently, audio messages sent through Evolution API display without the visual waveform in WhatsApp, making them look less authentic compared to messages sent directly from the app.
Changes
audio-decodelibrary to analyze audio buffer and generate a 64-value waveform array representing the audio amplitudepatch-package)Technical Details
getAudioDuration(): Extracts duration in seconds from audio buffergetAudioWaveform(): Generates normalized waveform (0-100 range) with 64 sample pointsUint8Arrayfor Baileys compatibilityRelated Issues
Testing
Screenshots
Voice messages now display with proper waveform visualization instead of a flat line.
Summary by Sourcery
Add waveform-enabled PTT audio sending for WhatsApp and wire up Baileys patching in builds.
New Features:
Bug Fixes:
Enhancements:
Build: