Syncing Voice to Text in Openai Realtime voice in ReactJS - Stack Overflow-软件玩家

admin管理员组
文章数量:1402808

I am using OpenAI’s real-time API (gpt-4o-realtime-preview-2024-12-17) in a React-based application for live transcription and response generation. However, I am facing an issue where the transcribed text and the generated speech output do not align properly. Sometimes the text appears earlier than expected, or the audio plays with a delay. Implementation Details: The application uses WebSockets to stream real-time audio to OpenAI. I am using the RealtimeClient from OpenAI’s API to send and receive live audio responses. The WavRecorder and WavStreamPlayer are used to handle audio streaming and playback, since the audio is in 16bitPCM format The text responses are updated dynamically as they arrive via the API. this is the code for connecting the api

const connectConversation = useCallback(async () => {
const client = clientRef.current;
const wavRecorder = wavRecorderRef.current;
const wavStreamPlayer = wavStreamPlayerRef.current;
await wavRecorder.begin();
await wavStreamPlayer.connect();
try {
    const response = await client.connect();
    if (response) {
        setLoading(false);
        client.sendUserMessageContent([{ type: "input_text", text: "Hello!" }]);
        
        if (client.getTurnDetectionType() === "server_vad") {
            await wavRecorder.record((data) => client.appendInputAudio(data.mono));
        }
    }
} catch (error) {
    console.error("Error connecting:", error);
}
}, []);
this is the code for getting the response
client.on("conversation.updated", async ({ item, delta }) => {
if (item.role === "assistant" && delta?.audio) {
    wavStreamPlayer.add16BitPCM(delta.audio, item.id);
    textRef.current = item.formatted.transcript; // Text updates immediately
} else if (delta?.text) {
    textRef.current = item.formatted.transcript;
}
if (item.status === "completed" && item.formatted.audio?.length) {
    const wavFile = await WavRecorder.decode(item.formatted.audio, 24000, 24000);
    setAudiosrc(wavFile.url);
}
});

Problem observed Couldn’t scroll the text with sync to the audio scrolling login based on duration as 150 words per minute

const scrollText = () => {
 if (!scrollContainerRef.current) return;
 const container = scrollContainerRef.current;
 const currentTime = Date.now();
 const elapsed = currentTime - scrollStartTimeRef.current;
 const duration = getScrollDuration(text);
 if (elapsed >= duration) {
   container.scrollTop = container.scrollHeight - container.clientHeight;
   return;
 }
 const progress = elapsed / duration;
 const targetScrollTop = container.scrollHeight - container.clientHeight;
 // Smooth easing function for better scrolling
 const easeInOutQuad = (t) =>
   t < 0.5 ? 2 * t * t : 1 - Math.pow(-2 * t + 2, 2) / 2;
 container.scrollTop = targetScrollTop * easeInOutQuad(progress);
 animationFrameRef.current = requestAnimationFrame(scrollText);
};

Approach taken Converting 16-bit PCM into an audio source const wavFile = await WavRecorder.decode(item.formatted.audio, 24000, 24000); setAudiosrc(wavFile.url); However, conversion takes time depending on the length of the

response, causing desynchronization. Scrolling based on word count (150 WPM rule)

const wordsPerMinute = 150; const words = text.split(" ").length; return (words / wordsPerMinute) * 60 * 1000;

This works for short responses but fails for larger responses due to variation in speech speed. Questions: How can I accurately sync the text scroll with the real-time audio

playback? Are there any existing libraries or best practices for

handling text-audio synchronization in real-time applications? Any insights or suggestions would be greatly appreciated!

本文标签： Syncing Voice to Text in Openai Realtime voice in ReactJSStack Overflow

版权声明：本文标题：Syncing Voice to Text in Openai Realtime voice in ReactJS - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1744372727a2603109.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

Syncing Voice to Text in Openai Realtime voice in ReactJS - Stack Overflow

更多相关文章