Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ElevenLabsTTSService and PlayHTTTSService interruptions don't occur with long LLM completions #950

Open
danthegoodman1 opened this issue Jan 9, 2025 · 7 comments

Comments

@danthegoodman1
Copy link
Contributor

danthegoodman1 commented Jan 9, 2025

Description

When an LLM is still generating sentences when an interruption occurs (e.g. "tell me a long story"), any currently queued TTS messages are aborted, however the LLM continues to generate sentences, causing the bot to start speaking again.

This makes the experience SUPER painful to users, causing them to want to scream at the bot until it shuts up.

Environment

  • pipecat-ai version: 0.0.52

Repro steps

Ask the bot to say something long, interrupt it while it's still generating messages.

Expected behavior

LLM stops generating when interrupted.

Actual behavior

LLM keeps generating, pushing more TTS frames and causes weird skipping.

Logs

The following logs mix our custom logging and processors, but they accompany the recording so you can see what I mean

When you see the various Got frame StartInterruptionFrame is when we're sending proper interruptions. As you can see in the attached video and the logs below, we send audio frames between Generating TTS logs, which means the LLM is still generating frames. Because the LLM is not interrupted, you can hear that previously generated frames are skipped, but the new ones generated play the TTS despite being interrupted.

tangia_dantest.2025-01-08.20_19_02.mp4
ume=0.6
2025-01-08 20:19:01.500 | DEBUG    | pipecat.audio.vad.silero:__init__:113 - Loading Silero VAD model...
2025-01-08 20:19:01.539 | DEBUG    | pipecat.audio.vad.silero:__init__:135 - Loaded Silero VAD
2025-01-08 20:19:01.557 | DEBUG    | __main__:main:333 - Recording to dir recordings
2025-01-08 20:19:01.557 | DEBUG    | pipecat.pipeline.parallel_pipeline:__init__:89 - Creating ParallelPipeline#0 pipelines
2025-01-08 20:19:01.557 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking PipelineSource#0 -> AudioRecordingProcessor#0
2025-01-08 20:19:01.557 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking AudioRecordingProcessor#0 -> PipelineSink#0
2025-01-08 20:19:01.557 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking Source#0 -> Pipeline#0
2025-01-08 20:19:01.557 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking Pipeline#0 -> Sink#0
2025-01-08 20:19:01.557 | DEBUG    | pipecat.pipeline.parallel_pipeline:__init__:106 - Finished creating ParallelPipeline#0 pipelines
2025-01-08 20:19:01.558 | DEBUG    | pipecat.pipeline.parallel_pipeline:__init__:89 - Creating ParallelPipeline#1 pipelines
2025-01-08 20:19:01.558 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking PipelineSource#1 -> DailyOutputTransport#0
2025-01-08 20:19:01.558 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking DailyOutputTransport#0 -> AudioRecordingProcessor#1
2025-01-08 20:19:01.558 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking AudioRecordingProcessor#1 -> PipelineSink#1
2025-01-08 20:19:01.558 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking Source#1 -> Pipeline#1
2025-01-08 20:19:01.558 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking Pipeline#1 -> Sink#1
2025-01-08 20:19:01.558 | DEBUG    | pipecat.pipeline.parallel_pipeline:__init__:106 - Finished creating ParallelPipeline#1 pipelines
2025-01-08 20:19:01.558 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking PipelineSource#2 -> EndCallProbe#0
2025-01-08 20:19:01.558 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking EndCallProbe#0 -> DailyInputTransport#0
2025-01-08 20:19:01.558 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking DailyInputTransport#0 -> ParallelPipeline#0
2025-01-08 20:19:01.558 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking ParallelPipeline#0 -> FunctionCallInterruptionBlocker#0
2025-01-08 20:19:01.558 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking FunctionCallInterruptionBlocker#0 -> DeepgramSTTService#0
2025-01-08 20:19:01.558 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking DeepgramSTTService#0 -> TranscriptionBufferFilterProcessor#0
2025-01-08 20:19:01.558 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking TranscriptionBufferFilterProcessor#0 -> SimpleFrameLogger#0
2025-01-08 20:19:01.558 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking SimpleFrameLogger#0 -> InterruptionInserter#0
2025-01-08 20:19:01.558 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking InterruptionInserter#0 -> UserIdleCountReset#0
2025-01-08 20:19:01.558 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking UserIdleCountReset#0 -> UserIdleProcessor#0
2025-01-08 20:19:01.558 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking UserIdleProcessor#0 -> OpenAIUserContextAggregator#0
2025-01-08 20:19:01.558 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking OpenAIUserContextAggregator#0 -> OpenAILLMService#0
2025-01-08 20:19:01.558 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking OpenAILLMService#0 -> FunctionCallInterruptionObserver#0
2025-01-08 20:19:01.558 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking FunctionCallInterruptionObserver#0 -> MessagesShipper#0
2025-01-08 20:19:01.558 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking MessagesShipper#0 -> ElevenLabsTTSService#0
2025-01-08 20:19:01.558 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking ElevenLabsTTSService#0 -> ParallelPipeline#1
2025-01-08 20:19:01.559 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking ParallelPipeline#1 -> EndCallProbe#1
2025-01-08 20:19:01.559 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking EndCallProbe#1 -> OpenAIAssistantContextAggregator#0
2025-01-08 20:19:01.559 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking OpenAIAssistantContextAggregator#0 -> MessagesShipper#1
2025-01-08 20:19:01.559 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking MessagesShipper#1 -> PipelineSink#2
2025-01-08 20:19:01.559 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking Source#2 -> Pipeline#2
2025-01-08 20:19:01.559 | DEBUG    | pipecat.processors.frame_processor:link:150 - Linking Pipeline#2 -> Sink#2
2025-01-08 20:19:01.559 | DEBUG    | pipecat.pipeline.runner:run:27 - Runner PipelineRunner#0 started running PipelineTask#0
2025-01-08 20:19:01.559 | DEBUG    | pipecat.services.deepgram:_connect:202 - Connecting to Deepgram
2025-01-08 20:19:01.823 | DEBUG    | pipecat.services.elevenlabs:_connect_websocket:318 - Connecting to ElevenLabs
2025-01-08 20:19:01.937 | INFO     | pipecat.transports.services.daily:join:322 - Joining <REDACTED ROOM>
2025-01-08 20:19:02.115 | DEBUG    | __main__:on_call_state_updated:475 - sending on_call_state_updated: joining
2025-01-08 20:19:02.291 | INFO     | pipecat.transports.services.daily:on_participant_joined:620 - Participant joined aea5aa2a-4744-49c8-8826-bf90cda64d2d
2025-01-08 20:19:02.292 | DEBUG    | __main__:on_first_participant_joined:436 - sending on_first_participant_joined
2025-01-08 20:19:02.293 | DEBUG    | __main__:on_participant_joined:420 - sending on_participant_joined
2025-01-08 20:19:02.349 | DEBUG    | __main__:on_call_state_updated:475 - sending on_call_state_updated: joined
2025-01-08 20:19:02.589 | INFO     | pipecat.transports.services.daily:join:340 - Joined <REDACTED ROOM>
2025-01-08 20:19:02.589 | DEBUG    | __main__:on_joined:452 - sending on_joined
2025-01-08 20:19:04.441 | DEBUG    | pipecat.transports.base_input:_handle_interruptions:124 - User started speaking
2025-01-08 20:19:04.443 | DEBUG    | transcription_buffer_filter_processor:process_frame:61 - closing gate [PII:] StartInterruptionFrame#0
2025-01-08 20:19:04.681 | DEBUG    | simple_frame_logger:process_frame:22 - 1 Got frame StartInterruptionFrame#0 FrameDirection.DOWNSTREAM
2025-01-08 20:19:04.685 | DEBUG    | simple_frame_logger:process_frame:22 - 1 Got frame UserStartedSpeakingFrame#0 FrameDirection.DOWNSTREAM
2025-01-08 20:19:04.686 | DEBUG    | __main__:process_frame:103 - resetting idle count
2025-01-08 20:19:04.686 | DEBUG    | transcription_buffer_filter_processor:process_frame:108 - opening closed gate and flushing buffer [PII:] InterimTranscriptionFrame#0(user: , text: [Tell me a], language: None, timestamp: 2025-01-09T04:19:04.680+00:00)
2025-01-08 20:19:05.942 | DEBUG    | pipecat.transports.base_input:_handle_interruptions:131 - User stopped speaking
2025-01-08 20:19:05.944 | DEBUG    | pipecat.services.openai:_stream_chat_completions:176 - Generating chat: [{"role": "system", "content": "you are a helpful assistant"}, {"role": "user", "content": "Tell me a long story."}]
2025-01-08 20:19:06.304 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:stop_ttfb_metrics:50 - OpenAILLMService#0 TTFB: 0.36070704460144043
2025-01-08 20:19:06.703 | DEBUG    | pipecat.services.elevenlabs:run_tts:420 - Generating TTS: [Once upon a time, in a quaint little village nestled in the hills of Tuscany, there lived a young woman named Isabella.]
2025-01-08 20:19:06.703 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:start_tts_usage_metrics:85 - ElevenLabsTTSService#0 usage characters: 119
2025-01-08 20:19:06.704 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:stop_processing_metrics:65 - ElevenLabsTTSService#0 processing time: 0.0005850791931152344
2025-01-08 20:19:06.864 | DEBUG    | pipecat.services.elevenlabs:run_tts:420 - Generating TTS: [ Isabella was known throughout the village for her kindness and adventurous spirit.]
2025-01-08 20:19:06.865 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:start_tts_usage_metrics:85 - ElevenLabsTTSService#0 usage characters: 83
2025-01-08 20:19:06.865 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:stop_processing_metrics:65 - ElevenLabsTTSService#0 processing time: 0.0007231235504150391
2025-01-08 20:19:07.152 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:stop_ttfb_metrics:50 - ElevenLabsTTSService#0 TTFB: 0.44930601119995117
2025-01-08 20:19:07.161 | DEBUG    | pipecat.transports.base_output:_bot_started_speaking:203 - Bot started speaking
2025-01-08 20:19:07.162 | WARNING  | transcription_buffer_filter_processor:process_frame:40 - BOT SPEAKING
2025-01-08 20:19:07.364 | DEBUG    | pipecat.services.elevenlabs:run_tts:420 - Generating TTS: [ She spent her days helping her mother tend to their garden and her evenings dreaming of far-off places she longed to explore.]
2025-01-08 20:19:07.364 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:start_tts_usage_metrics:85 - ElevenLabsTTSService#0 usage characters: 126
2025-01-08 20:19:07.365 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:stop_processing_metrics:65 - ElevenLabsTTSService#0 processing time: 0.0007388591766357422
2025-01-08 20:19:08.399 | DEBUG    | pipecat.services.elevenlabs:run_tts:420 - Generating TTS: [

One sunny afternoon, as Isabella was walking through the meadow behind her house, she stumbled upon an old, dusty book partially hidden beneath a rocky outcrop.]
2025-01-08 20:19:08.401 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:start_tts_usage_metrics:85 - ElevenLabsTTSService#0 usage characters: 162
2025-01-08 20:19:08.401 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:stop_processing_metrics:65 - ElevenLabsTTSService#0 processing time: 0.0019288063049316406
2025-01-08 20:19:08.405 | WARNING  | transcription_buffer_filter_processor:process_frame:71 - interim frame incrementing interim word count by 1 to 1
2025-01-08 20:19:08.501 | DEBUG    | pipecat.transports.base_input:_handle_interruptions:124 - User started speaking
2025-01-08 20:19:08.503 | DEBUG    | transcription_buffer_filter_processor:process_frame:61 - closing gate [PII:] StartInterruptionFrame#1
2025-01-08 20:19:08.796 | DEBUG    | pipecat.services.elevenlabs:run_tts:420 - Generating TTS: [ Curious, she picked it up and blew away the dust that had settled over its cover.]
2025-01-08 20:19:08.796 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:start_tts_usage_metrics:85 - ElevenLabsTTSService#0 usage characters: 82
2025-01-08 20:19:08.797 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:stop_processing_metrics:65 - ElevenLabsTTSService#0 processing time: 0.001013040542602539
2025-01-08 20:19:09.194 | DEBUG    | pipecat.services.elevenlabs:run_tts:420 - Generating TTS: [ The book was bound in worn, crimson leather, and the pages were yellowed with age.]
2025-01-08 20:19:09.195 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:start_tts_usage_metrics:85 - ElevenLabsTTSService#0 usage characters: 83
2025-01-08 20:19:09.195 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:stop_processing_metrics:65 - ElevenLabsTTSService#0 processing time: 0.0011281967163085938
2025-01-08 20:19:09.276 | DEBUG    | pipecat.services.elevenlabs:run_tts:420 - Generating TTS: [ On the front, embossed in gold were the words:]
2025-01-08 20:19:09.276 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:start_tts_usage_metrics:85 - ElevenLabsTTSService#0 usage characters: 47
2025-01-08 20:19:09.279 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:stop_processing_metrics:65 - ElevenLabsTTSService#0 processing time: 0.0034401416778564453
2025-01-08 20:19:09.385 | WARNING  | transcription_buffer_filter_processor:process_frame:71 - interim frame incrementing interim word count by 3 to 3
2025-01-08 20:19:09.385 | DEBUG    | simple_frame_logger:process_frame:22 - 1 Got frame StartInterruptionFrame#1 FrameDirection.DOWNSTREAM
2025-01-08 20:19:09.387 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:stop_processing_metrics:65 - OpenAILLMService#0 processing time: 3.4434049129486084
2025-01-08 20:19:09.391 | DEBUG    | pipecat.transports.base_output:_bot_stopped_speaking:210 - Bot stopped speaking
2025-01-08 20:19:09.392 | WARNING  | transcription_buffer_filter_processor:process_frame:47 - BOT NOT SPEAKING
2025-01-08 20:19:09.392 | DEBUG    | simple_frame_logger:process_frame:22 - 1 Got frame UserStartedSpeakingFrame#1 FrameDirection.DOWNSTREAM
2025-01-08 20:19:09.392 | DEBUG    | __main__:process_frame:103 - resetting idle count
2025-01-08 20:19:09.392 | DEBUG    | transcription_buffer_filter_processor:process_frame:108 - opening closed gate and flushing buffer [PII:] InterimTranscriptionFrame#2(user: , text: [Hey. Stop talking.], language: None, timestamp: 2025-01-09T04:19:09.384+00:00)
2025-01-08 20:19:09.591 | DEBUG    | pipecat.transports.base_output:_bot_started_speaking:203 - Bot started speaking
2025-01-08 20:19:09.592 | WARNING  | transcription_buffer_filter_processor:process_frame:40 - BOT SPEAKING
2025-01-08 20:19:09.880 | DEBUG    | pipecat.transports.base_input:_handle_interruptions:131 - User stopped speaking
2025-01-08 20:19:09.884 | DEBUG    | interruption_inserter:process_frame:40 - user speaking over bot, inserting interruption message as role system
2025-01-08 20:19:09.884 | DEBUG    | pipecat.services.openai:_stream_chat_completions:176 - Generating chat: [{"role": "system", "content": "you are a helpful assistant"}, {"role": "user", "content": "Tell me a long story."}, {"role": "assistant", "content": "Once upon a time, in a quaint"}, {"role": "user", "content": "Hey. Stop talking."}, {"role": "system", "content": "(interrupted by user)"}]
2025-01-08 20:19:10.216 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:stop_ttfb_metrics:50 - OpenAILLMService#0 TTFB: 0.33182811737060547
2025-01-08 20:19:10.383 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:start_llm_usage_metrics:73 - OpenAILLMService#0 prompt tokens: 474, completion tokens: 12
2025-01-08 20:19:10.385 | DEBUG    | pipecat.services.elevenlabs:run_tts:420 - Generating TTS: [Alright, let me know if you need anything else!]
2025-01-08 20:19:10.387 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:start_tts_usage_metrics:85 - ElevenLabsTTSService#0 usage characters: 47
2025-01-08 20:19:10.388 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:stop_processing_metrics:65 - OpenAILLMService#0 processing time: 0.5036892890930176
2025-01-08 20:19:10.388 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:stop_processing_metrics:65 - ElevenLabsTTSService#0 processing time: 0.002924203872680664
2025-01-08 20:19:10.684 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:stop_ttfb_metrics:50 - ElevenLabsTTSService#0 TTFB: 0.2980539798736572
2025-01-08 20:19:11.321 | DEBUG    | pipecat.transports.base_input:_handle_interruptions:124 - User started speaking
2025-01-08 20:19:11.324 | DEBUG    | transcription_buffer_filter_processor:process_frame:61 - closing gate [PII:] StartInterruptionFrame#2
2025-01-08 20:19:11.382 | WARNING  | transcription_buffer_filter_processor:process_frame:71 - interim frame incrementing interim word count by 1 to 1
2025-01-08 20:19:12.164 | WARNING  | transcription_buffer_filter_processor:process_frame:79 - transcription frame incrementing setting transcription word count to 3
2025-01-08 20:19:12.164 | DEBUG    | simple_frame_logger:process_frame:22 - 1 Got frame StartInterruptionFrame#2 FrameDirection.DOWNSTREAM
2025-01-08 20:19:12.168 | DEBUG    | pipecat.transports.base_output:_bot_stopped_speaking:210 - Bot stopped speaking
2025-01-08 20:19:12.169 | WARNING  | transcription_buffer_filter_processor:process_frame:47 - BOT NOT SPEAKING
2025-01-08 20:19:12.169 | DEBUG    | simple_frame_logger:process_frame:22 - 1 Got frame UserStartedSpeakingFrame#2 FrameDirection.DOWNSTREAM
2025-01-08 20:19:12.170 | DEBUG    | __main__:process_frame:103 - resetting idle count
2025-01-08 20:19:12.170 | DEBUG    | transcription_buffer_filter_processor:process_frame:108 - opening closed gate and flushing buffer [PII:] TranscriptionFrame#2(user: , text: [Hey. Stop talking.], language: None, timestamp: 2025-01-09T04:19:12.164+00:00)
2025-01-08 20:19:12.681 | DEBUG    | pipecat.transports.base_input:_handle_interruptions:131 - User stopped speaking
2025-01-08 20:19:12.682 | DEBUG    | pipecat.services.openai:_stream_chat_completions:176 - Generating chat: [{"role": "system", "content": "you are a helpful assistant"}, {"role": "user", "content": "Tell me a long story."}, {"role": "assistant", "content": "Once upon a time, in a quaint"}, {"role": "user", "content": "Hey. Stop talking."}, {"role": "system", "content": "(interrupted by user)"}, {"role": "user", "content": "Hey. Stop talking."}]
2025-01-08 20:19:13.101 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:stop_ttfb_metrics:50 - OpenAILLMService#0 TTFB: 0.4193551540374756
2025-01-08 20:19:13.192 | DEBUG    | pipecat.services.elevenlabs:run_tts:420 - Generating TTS: [Alright, I'll stop the story.]
2025-01-08 20:19:13.192 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:start_tts_usage_metrics:85 - ElevenLabsTTSService#0 usage characters: 29
2025-01-08 20:19:13.193 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:stop_processing_metrics:65 - ElevenLabsTTSService#0 processing time: 0.0015048980712890625
2025-01-08 20:19:13.356 | DEBUG    | pipecat.services.elevenlabs:run_tts:420 - Generating TTS: [ Let me know if there's anything else you'd like to do!]
2025-01-08 20:19:13.357 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:start_tts_usage_metrics:85 - ElevenLabsTTSService#0 usage characters: 55
2025-01-08 20:19:13.357 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:stop_processing_metrics:65 - ElevenLabsTTSService#0 processing time: 0.0009808540344238281
2025-01-08 20:19:13.360 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:start_llm_usage_metrics:73 - OpenAILLMService#0 prompt tokens: 483, completion tokens: 20
2025-01-08 20:19:13.362 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:stop_processing_metrics:65 - OpenAILLMService#0 processing time: 0.6794378757476807
2025-01-08 20:19:13.476 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:stop_ttfb_metrics:50 - ElevenLabsTTSService#0 TTFB: 0.2838602066040039
2025-01-08 20:19:13.480 | DEBUG    | pipecat.transports.base_output:_bot_started_speaking:203 - Bot started speaking
2025-01-08 20:19:13.481 | WARNING  | transcription_buffer_filter_processor:process_frame:40 - BOT SPEAKING
@danthegoodman1
Copy link
Contributor Author

Sometimes I get a sort of "inverse" behavior where it will stop generating new LLM frames, but instead will just continue to play the TTS that's already been generated

@jonnyjohnson1
Copy link

I'm observing similar behaviors.

What OS are you running this on? I have no issue on Mac, then weirdness on Windows.

Sometimes when this happens, I see the transcript come through for the interruption because it is printed in the terminal like yours does in the output above, so I know the STT service picked it up, but then that frame doesn't appear to register with the rest of the pipeline.

Finally, when the thing does register what I said, it only includes the last statement, and all those STT frames that were transcribed by the service aren't included in the message history that is sent to the llm step. So those frames get lost somewhere in the process.

@danthegoodman1
Copy link
Contributor Author

I'm on macOS. I also observe that it's inconsistent. I can change the behavior by slimming it down, but ultimately I can still trigger this even with slimmed versions (removing custom processors).

@jonnyjohnson1
Copy link

In some more testing, it seems that maybe it is the case that the rule is I am not allowed to interrupt until it has finished speaking its first sentence, then the interruptions work as expected.

@danthegoodman1
Copy link
Contributor Author

No I can trigger this outside that sentence, and if it’s only generating one sentence I can interrupt and don’t get this behavior

@jonnyjohnson1
Copy link

:/ I keep hoping to uncover some general rule running beneath the thing...something to explain the apparent inconsistent performance.

I'm still exploring fixes.

@markbackman
Copy link
Contributor

markbackman commented Jan 14, 2025

Hi all, @aconchillo and I have confirmed that this issue affects ElevenLabsTTSService and PlayHTTTSService. We are collaborating with both the ElevenLabs and PlayHT teams to determine solutions for enabling interruptions, regardless of the LLM response length.

In the meantime, interruptions work well with CartesiaTTSService, which uses a WebSocket-based connection with excellent TTFB (consistently ~170ms). Additionally, all HTTP-based TTS services support interruptions. You can check out a list of available services here: https://docs.pipecat.ai/server/services/supported-services#text-to-speech.

Solving this problem is a high priority for us, and we will provide updates in this issue as they become available.

Details:

  • PlayHT: It appears that there's a bug with the PlayHT Websocket API. They have a parameter, request_id, which we provide for each TTS generation. The request_id isn't being respected, which means that we don't have a way to correlate the interruption with a TTS generation.
  • ElevenLabs: They're working on adding context_ids so we can correlate an interruption with a TTS generation
  • Cartesia: This service works because context_ids are available and working, so Pipecat can correlate an interruption with a TTS generation

@markbackman markbackman changed the title Interruptions do not stop LLM generations, causing TTS skips and "failed" interruptions ElevenLabsTTSService and PlayHTTTSService interruptions don't occur with long LLM completions Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants