admin管理员组文章数量:1402808
I am using OpenAI’s real-time API (gpt-4o-realtime-preview-2024-12-17) in a React-based application for live transcription and response generation. However, I am facing an issue where the transcribed text and the generated speech output do not align properly. Sometimes the text appears earlier than expected, or the audio plays with a delay. Implementation Details: The application uses WebSockets to stream real-time audio to OpenAI. I am using the RealtimeClient from OpenAI’s API to send and receive live audio responses. The WavRecorder and WavStreamPlayer are used to handle audio streaming and playback, since the audio is in 16bitPCM format The text responses are updated dynamically as they arrive via the API. this is the code for connecting the api
const connectConversation = useCallback(async () => {
const client = clientRef.current;
const wavRecorder = wavRecorderRef.current;
const wavStreamPlayer = wavStreamPlayerRef.current;
await wavRecorder.begin();
await wavStreamPlayer.connect();
try {
const response = await client.connect();
if (response) {
setLoading(false);
client.sendUserMessageContent([{ type: "input_text", text: "Hello!" }]);
if (client.getTurnDetectionType() === "server_vad") {
await wavRecorder.record((data) => client.appendInputAudio(data.mono));
}
}
} catch (error) {
console.error("Error connecting:", error);
}
}, []);
this is the code for getting the response
client.on("conversation.updated", async ({ item, delta }) => {
if (item.role === "assistant" && delta?.audio) {
wavStreamPlayer.add16BitPCM(delta.audio, item.id);
textRef.current = item.formatted.transcript; // Text updates immediately
} else if (delta?.text) {
textRef.current = item.formatted.transcript;
}
if (item.status === "completed" && item.formatted.audio?.length) {
const wavFile = await WavRecorder.decode(item.formatted.audio, 24000, 24000);
setAudiosrc(wavFile.url);
}
});
Problem observed Couldn’t scroll the text with sync to the audio scrolling login based on duration as 150 words per minute
const scrollText = () => {
if (!scrollContainerRef.current) return;
const container = scrollContainerRef.current;
const currentTime = Date.now();
const elapsed = currentTime - scrollStartTimeRef.current;
const duration = getScrollDuration(text);
if (elapsed >= duration) {
container.scrollTop = container.scrollHeight - container.clientHeight;
return;
}
const progress = elapsed / duration;
const targetScrollTop = container.scrollHeight - container.clientHeight;
// Smooth easing function for better scrolling
const easeInOutQuad = (t) =>
t < 0.5 ? 2 * t * t : 1 - Math.pow(-2 * t + 2, 2) / 2;
container.scrollTop = targetScrollTop * easeInOutQuad(progress);
animationFrameRef.current = requestAnimationFrame(scrollText);
};
Approach taken Converting 16-bit PCM into an audio source const wavFile = await WavRecorder.decode(item.formatted.audio, 24000, 24000); setAudiosrc(wavFile.url); However, conversion takes time depending on the length of the
response, causing desynchronization. Scrolling based on word count (150 WPM rule)
const wordsPerMinute = 150; const words = text.split(" ").length; return (words / wordsPerMinute) * 60 * 1000;
This works for short responses but fails for larger responses due to variation in speech speed. Questions: How can I accurately sync the text scroll with the real-time audio
playback? Are there any existing libraries or best practices for
handling text-audio synchronization in real-time applications? Any insights or suggestions would be greatly appreciated!
本文标签: Syncing Voice to Text in Openai Realtime voice in ReactJSStack Overflow
版权声明:本文标题:Syncing Voice to Text in Openai Realtime voice in ReactJS - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744372727a2603109.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论