-
I tried a python script from this discussion with a few changes: if __name__ == "__main__":
import threading
import pyaudio
from RealtimeSTT import AudioToTextRecorder
# Audio stream configuration constants
CHUNK = 4 * 1024 # Number of audio samples per buffer
FORMAT = pyaudio.paInt16 # Sample format (16-bit integer)
CHANNELS = 1 # Mono audio
RATE = 48000 # Sampling rate in Hz (expected by the recorder)
# Initialize the audio-to-text recorder without using the microphone directly
# Since we are feeding audio data manually, set use_microphone to False
recorder = AudioToTextRecorder(
use_microphone=False, # Disable built-in microphone usage
spinner=False # Disable spinner animation in the console
)
# Event to signal when to stop the threads
stop_event = threading.Event()
def feed_audio_thread():
"""Thread function to read audio data and feed it to the recorder."""
p = pyaudio.PyAudio()
# Open an input audio stream with the specified configuration
stream = p.open(
format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK
)
try:
print("Speak now")
while not stop_event.is_set():
# Read audio data from the stream (in the expected format)
data = stream.read(CHUNK, exception_on_overflow=False)
# Feed the audio data to the recorder
recorder.feed_audio(data)
except Exception as e:
print(f"feed_audio_thread encountered an error: {e}")
finally:
# Clean up the audio stream
stream.stop_stream()
stream.close()
p.terminate()
print("Audio stream closed.")
def recorder_transcription_thread():
"""Thread function to handle transcription and process the text."""
def process_text(full_sentence):
"""Callback function to process the transcribed text."""
print("Transcribed text:", full_sentence)
# Check for the stop command in the transcribed text
if "stop recording" in full_sentence.lower():
print("Stop command detected. Stopping threads...")
stop_event.set()
recorder.abort()
try:
while not stop_event.is_set():
# Get transcribed text and process it using the callback
recorder.text(process_text)
except Exception as e:
print(f"transcription_thread encountered an error: {e}")
finally:
print("Transcription thread exiting.")
try:
# Create and start the audio feeding thread
audio_thread = threading.Thread(target=feed_audio_thread)
audio_thread.daemon = False # Ensure the thread doesn't exit prematurely
audio_thread.start()
# Create and start the transcription thread
transcription_thread = threading.Thread(target=recorder_transcription_thread)
transcription_thread.daemon = False # Ensure the thread doesn't exit prematurely
transcription_thread.start()
# Wait for both threads to finish
audio_thread.join()
transcription_thread.join()
except KeyboardInterrupt:
print("Recording and transcription have stopped.")
print("exiting...")
finally:
recorder.shutdown() the changes I made are:
it previously works without any problem, however, when I tried again, it seems something changed and somehow the program didn't process the audio? here's how I setup stuff before running the script # create fake/virtual/dummy sink
$ pactl load-module module-null-sink sink_name=steam
# ensure the sink is created
$ pactl list short sinks
798 steam PipeWire float32le 2ch 48000Hz RUNNING
# search audio index for chrome process
$ pactl list sink-inputs short
6573 798 6572 PipeWire float32le 2ch 48000Hz
# redirect the audio to the dummy sink
$ pactl move-sink-input 6573 steam
# test stream ffmpeg from steam.monitor using pipe to ffplay (and it works as expected)
$ ffmpeg -f pulse -i steam.monitor -f s16le -ar 16k -acodec pcm_s16le -ac 1 -loglevel quiet - | ffplay -f s16le -ar 48k -
$ export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/home/reyuki/software/open-source/RealtimeSTT/.venv/lib/python3.12/site-packages/nvidia/cudnn/lib/"
# setup virtual environment and activate it
$ ffmpeg -f pulse -i steam.monitor -f s16le -ar 48k -acodec pcm_s16le -ac 1 -loglevel quiet - | python ./main.py
ALSA lib pcm_dsnoop.c:567:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2722:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2722:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2722:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_dsnoop.c:567:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave
Speak now
Recording and transcription have stopped.
exiting...
RealtimeSTT shutting down
RealTimeSTT: root - ERROR - Error receiving data from connection: handle is closed
^CException ignored in: <module 'threading' from '/usr/lib/python3.12/threading.py'>
Traceback (most recent call last):
File "/usr/lib/python3.12/threading.py", line 1624, in _shutdown
lock.acquire()
KeyboardInterrupt:
^CException ignored in atexit callback: <function _exit_function at 0x7b6bf7776020>
Traceback (most recent call last):
File "/usr/lib/python3.12/multiprocessing/util.py", line 363, in _exit_function
_run_finalizers()
File "/usr/lib/python3.12/multiprocessing/util.py", line 303, in _run_finalizers
finalizer()
File "/usr/lib/python3.12/multiprocessing/util.py", line 227, in __call__
res = self._callback(*self._args, **self._kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/multiprocessing/queues.py", line 219, in _finalize_join
thread.join()
File "/usr/lib/python3.12/threading.py", line 1149, in join
self._wait_for_tstate_lock()
File "/usr/lib/python3.12/threading.py", line 1169, in _wait_for_tstate_lock
if lock.acquire(block, timeout):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt: I also tried the example browser demo to make sure it's not an issue of hardware, and well.. it works as expected and generate transcribed texts. I'm new in audio stuff (just learn very basic and general stuff from a few days ago) and have zero knowledge in python and AI stuff, I definitely missing something, please point me to the right direction, thanks :) some references I used: |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
feed_audio needs raw PCM chunks at 16,000 Hz, mono, 16-bit, or NumPy arrays with sample rate submitted as parameter. Ideally, the chunks are 1024 bytes, but it’s flexible with sizes. Since you’re grabbing audio at 48,000 Hz from PyAudio, you’ll need to downsample it to 16,000 Hz first. That’s because Whisper and Silero are built for 16 kHz audio. Use something like scipy.signal.resample to handle the downsampling. Also, make sure the chunks stay in sync and are passed to feed_audio in real-time (should be given here since pyAudio records and delivers chunks in real-time). |
Beta Was this translation helpful? Give feedback.
feed_audio needs raw PCM chunks at 16,000 Hz, mono, 16-bit, or NumPy arrays with sample rate submitted as parameter. Ideally, the chunks are 1024 bytes, but it’s flexible with sizes.
Since you’re grabbing audio at 48,000 Hz from PyAudio, you’ll need to downsample it to 16,000 Hz first. That’s because Whisper and Silero are built for 16 kHz audio. Use something like scipy.signal.resample to handle the downsampling. Also, make sure the chunks stay in sync and are passed to feed_audio in real-time (should be given here since pyAudio records and delivers chunks in real-time).