admin管理员组

文章数量:1418336

Thanks so much in advance,

After a graceful restart of nodes, I'm experiencing an unusual access denied error on the pvc used for llm model cache stored on a local-nfs storage class.

  Warning  FailedMount       16m                  kubelet            MountVolume.SetUp failed for volume "pvc-8d73fc95-b785-4e12-b47a-c8d1c3d12f69" : mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t nfs -o retrans=2,timeo=30,vers=3 10.101.156.22:/export/pvc-8d73fc95-b785-4e12-b47a-c8d1c3d12f69 /var/lib/kubelet/pods/70e3e22b-dd08-4945-a039-a9ce107e525d/volumes/kubernetes.io~nfs/pvc-8d73fc95-b785-4e12-b47a-c8d1c3d12f69
Output: Created symlink /run/systemd/system/remote-fs.target.wants/rpc-statd.service → /lib/systemd/system/rpc-statd.service.
mount.nfs: Operation not permitted
  Warning  FailedMount  16m  kubelet  MountVolume.SetUp failed for volume "pvc-8d73fc95-b785-4e12-b47a-c8d1c3d12f69" : mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t nfs -o retrans=2,timeo=30,vers=3 10.101.156.22:/export/pvc-8d73fc95-b785-4e12-b47a-c8d1c3d12f69 /var/lib/kubelet/pods/70e3e22b-dd08-4945-a039-a9ce107e525d/volumes/kubernetes.io~nfs/pvc-8d73fc95-b785-4e12-b47a-c8d1c3d12f69
Output: mount.nfs: Operation not permitted
  Warning  FailedMount  15s (x14 over 16m)  kubelet  MountVolume.SetUp failed for volume "pvc-8d73fc95-b785-4e12-b47a-c8d1c3d12f69" : mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t nfs -o retrans=2,timeo=30,vers=3 10.101.156.22:/export/pvc-8d73fc95-b785-4e12-b47a-c8d1c3d12f69 /var/lib/kubelet/pods/70e3e22b-dd08-4945-a039-a9ce107e525d/volumes/kubernetes.io~nfs/pvc-8d73fc95-b785-4e12-b47a-c8d1c3d12f69
Output: mount.nfs: access denied by server while mounting 10.101.156.22:/export/pvc-8d73fc95-b785-4e12-b47a-c8d1c3d12f69

This is causing pods to be stuck in ContainerCreating status.

videosearch        vss-blueprint-0                                                   0/1     ContainerCreating   0              20h    <none>            worker-1    <none>
videosearch        vss-vss-deployment-5f758bc5df-fbm66                               0/1     Init:0/3            0              21h    <none>            worker-1    <none>
vllm               llama3-70b-bc4788446-9q8c2                                        0/1     ContainerCreating   0              21h    <none>            worker-2    <none>

The pv and pvc are both healthy, it seems just the mount command that the pods are issuing is failing.

My previous solution was to delete the pv and pvc and then redeploy the entire helm chart, but this is not ideal to have to redeploy a major workload after restart.

Would anyone happen to have a suggestion for something like this?

本文标签: large language modelAccess denied on pvc mount after Kubernetes cluster worker node rebootStack Overflow