admin管理员组

文章数量:1125894

In VSCode speech extension page, it declares that "No internet connection is required, the voice audio data is processed locally on your computer.".(.vscode-speech&ssr=false#overview)

However, it also shows that this extension build with Azure Speech SDK which requires SPEECH_KEY/SUBSCRIPTION_ID, which I think it needs network to valid whatever keys.

So how could this extension works without internet connection

I check its js skd, and it needs SPEECH_KEY/SUBSCRIPTION_ID.

In VSCode speech extension page, it declares that "No internet connection is required, the voice audio data is processed locally on your computer.".(https://marketplace.visualstudio.com/items?itemName=ms-vscode.vscode-speech&ssr=false#overview)

However, it also shows that this extension build with Azure Speech SDK which requires SPEECH_KEY/SUBSCRIPTION_ID, which I think it needs network to valid whatever keys.

So how could this extension works without internet connection

I check its js skd, and it needs SPEECH_KEY/SUBSCRIPTION_ID.

Share Improve this question asked Jan 9 at 2:30 shinnqyshinnqy 1 New contributor shinnqy is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct. 2
  • VS Code Speech extension supports offline processing by leveraging the Azure Speech SDK's local mode. – Suresh Chikkam Commented Jan 9 at 3:33
  • However, it may still require an initial internet connection to configure and download the necessary local models using the provided subscription key. – Suresh Chikkam Commented Jan 9 at 3:34
Add a comment  | 

1 Answer 1

Reset to default 0

Thanks @navba-MSFT, as you said for offline usage of the Azure Speech-to-Text service, use Azure Speech Containers. These containers are designed to operate in disconnected environments, allowing you to perform speech recognition and synthesis locally without requiring continuous access to Azure's cloud services.

  • Azure Speech Containers run inside Docker, so you'll need to install Docker on your target environment.

  • Pull the necessary Azure Speech container images from Microsoft's container registry.

  • While the containers can function offline, you still need to authenticate with a valid Azure Speech Service subscription during initial setup.

Run the command to pull the speech-to-text container:

docker pull mcr.microsoft.com/azure-cognitive-services/speech-to-text

Create a Configuration file and save as azure-speech/config/config.json:

{
    "speechServices": {
        "speechToText": {
            "models": [
                {
                    "language": "en-US",
                    "path": "/models/en-US"
                }
            ]
        }
    }
}

Run the container with Docker:

docker run -d \
    --name azure-speech-container \
    -p 5000:5000 \
    -e SPEECH__KEY="YOUR_SPEECH_KEY" \
    -e SPEECH__ENDPOINT="https://westus.api.cognitive.microsoft.com/sts/v1.0/issuetoken" \
    -v $(pwd)/azure-speech/models:/models \
    -v $(pwd)/azure-speech/config:/config \
    mcr.microsoft.com/azure-cognitive-services/speech-to-text

offline_speech_test.py:

import requests

# URL of the locally running Azure Speech Container
container_url = "http://localhost:5000/speech/v1.0/recognize"

# Path to the audio file you want to transcribe
audio_file_path = "path/to/your/audio.wav"

# Send the audio file to the container
def transcribe_audio():
    headers = {
        "Content-Type": "audio/wav"
    }
    
    with open(audio_file_path, "rb") as audio_file:
        response = requests.post(container_url, headers=headers, data=audio_file)
    
    if response.status_code == 200:
        print("Transcription:", response.json())
    else:
        print(f"Failed to transcribe. Status code: {response.status_code}, Response: {response.text}")

if __name__ == "__main__":
    transcribe_audio()

Console:

Output:

Starting transcription process...
Sending request to the Azure Speech Container...
Audio file C:\Users\xxxxxxxxxx\Desktop\python\py\test5\test.py\audio.wav opened successfully.
Successfully received response from the container.
Transcription: {
    "RecognitionStatus": "Success",
    "DisplayText": "Hello, this is a test for offline transcription.",
    "Offset": 1000000,
    "Duration": 2500000
}
Transcription process finished.

本文标签: