admin管理员组

文章数量:1292225

I just got started with SemanticKernel on local LLM.

I got it working with the following code:

var chat = app.Services.GetRequiredService<IChatCompletionService>();
ChatMessageContent response = await chat.GetChatMessageContentAsync(chatHistory); 
var items = response.Items;
var firstitem = items.FirstOrDefault();
var textContent = firstitem as TextContent;
Console.WriteLine(textContent?.Text);

This works as expected and produces a "Hello! How can I assist you today? ??" reply

However, I want to do this like everybody else with streaming.

await foreach (StreamingChatMessageContent stream in chat.GetStreamingChatMessageContentsAsync("Hi"))
{
    // await Task.Yield(); // tried this to see if it would help but it didn't
    Console.WriteLine(stream.Content);
}

But this returns 12 "empty" results, which if serialised

{"Content":null,"Role":{"Label":"Assistant"},"ChoiceIndex":0,"ModelId":"deepseek-r1-distill-llama-8b","Metadata":{"CompletionId":"chatcmpl-m086eaeve495763ls6arwj","CreatedAt":"2025-02-13T10:22:51+00:00","SystemFingerprint":"deepseek-r1-distill-llama-8b","RefusalUpdate":null,"Usage":null,"FinishReason":null}}

followed by a "stop"

{"Content":null,"Role":null,"ChoiceIndex":0,"ModelId":"deepseek-r1-distill-llama-8b","Metadata":{"CompletionId":"chatcmpl-m086eaeve495763ls6arwj","CreatedAt":"2025-02-13T10:22:51+00:00","SystemFingerprint":"deepseek-r1-distill-llama-8b","RefusalUpdate":null,"Usage":null,"FinishReason":"Stop"}}

So I know the server is running as the direct approach works fine, but I cannot get the streaming to work properly.

For the direct message without streaming, here is the server log for the request:

2025-02-13 12:28:42 [DEBUG] 
Received request: POST to /v1/chat/completions with body  {
  "messages": [
    {
      "role": "user",
      "content": "Hi"
    }
  ],
  "model": "deepseek-r1-distill-llama-8b"
}
2025-02-13 12:28:42  [INFO] 
[LM STUDIO SERVER] Running chat completion on conversation with 1 messages.
2025-02-13 12:28:42 [DEBUG] 
Sampling params:    repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
    dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = -1
    top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, temp = 0.800
    mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
2025-02-13 12:28:42 [DEBUG] 
sampling: 
logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
generate: n_ctx = 4096, n_batch = 512, n_predict = -1, n_keep = 12
BeginProcessingPrompt
2025-02-13 12:28:42 [DEBUG] 
FinishedProcessingPrompt. Progress: 100
2025-02-13 12:28:42  [INFO] 
[LM STUDIO SERVER] Accumulating tokens ... (stream = false)
2025-02-13 12:28:42 [DEBUG] 
[deepseek-r1-distill-llama-8b] Accumulated 1 tokens <think>
2025-02-13 12:28:42 [DEBUG] 
[deepseek-r1-distill-llama-8b] Accumulated 2 tokens <think>\n\n
2025-02-13 12:28:42 [DEBUG] 
[deepseek-r1-distill-llama-8b] Accumulated 3 tokens <think>\n\n</think>
2025-02-13 12:28:42 [DEBUG] 
[deepseek-r1-distill-llama-8b] Accumulated 4 tokens <think>\n\n</think>\n\n
2025-02-13 12:28:42 [DEBUG] 
[deepseek-r1-distill-llama-8b] Accumulated 5 tokens <think>\n\n</think>\n\nHello
2025-02-13 12:28:42 [DEBUG] 
[deepseek-r1-distill-llama-8b] Accumulated 6 tokens <think>\n\n</think>\n\nHello!
2025-02-13 12:28:42 [DEBUG] 
[deepseek-r1-distill-llama-8b] Accumulated 7 tokens <think>\n\n</think>\n\nHello! How
2025-02-13 12:28:42 [DEBUG] 
[deepseek-r1-distill-llama-8b] Accumulated 8 tokens <think>\n\n</think>\n\nHello! How can
2025-02-13 12:28:42 [DEBUG] 
[deepseek-r1-distill-llama-8b] Accumulated 9 tokens <think>\n\n</think>\n\nHello! How can I
2025-02-13 12:28:42 [DEBUG] 
[deepseek-r1-distill-llama-8b] Accumulated 10 tokens <think>\n\n</think>\n\nHello! How can I assist
2025-02-13 12:28:42 [DEBUG] 
[deepseek-r1-distill-llama-8b] Accumulated 11 tokens <think>\n\n</think>\n\nHello! How can I assist you
2025-02-13 12:28:42 [DEBUG] 
[deepseek-r1-distill-llama-8b] Accumulated 12 tokens <think>\n\n</think>\n\nHello! How can I assist you today
2025-02-13 12:28:42 [DEBUG] 
[deepseek-r1-distill-llama-8b] Accumulated 13 tokens <think>\n\n</think>\n\nHello! How can I assist you today?
2025-02-13 12:28:42 [DEBUG] 
Incomplete UTF-8 character. Waiting for next token (skip)
2025-02-13 12:28:42 [DEBUG] 
[deepseek-r1-distill-llama-8b] Accumulated 14 tokens <think>\n\n</think>\n\nHello! How can I assist you today? 

本文标签: