Skip to content

Unable to stream the raw audio to stdin using ffmpeg #157

Answered by KoljaB
realyukii asked this question in Q&A
Discussion options

You must be logged in to vote

feed_audio needs raw PCM chunks at 16,000 Hz, mono, 16-bit, or NumPy arrays with sample rate submitted as parameter. Ideally, the chunks are 1024 bytes, but it’s flexible with sizes.

Since you’re grabbing audio at 48,000 Hz from PyAudio, you’ll need to downsample it to 16,000 Hz first. That’s because Whisper and Silero are built for 16 kHz audio. Use something like scipy.signal.resample to handle the downsampling. Also, make sure the chunks stay in sync and are passed to feed_audio in real-time (should be given here since pyAudio records and delivers chunks in real-time).

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@realyukii
Comment options

@KoljaB
Comment options

Answer selected by realyukii
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants