admin管理员组文章数量:1289496
I have a FastAPI application with multiple endpoints, and each endpoint uses certain memory intensive objects (ML models). This works fine when I only have one worker, but I am worried about memory usage (and to a lesser extent startup time) when I scale to multiple workers.
Is there a way to limit certain workers to certain endpoints only? Then I would only load the objects required for the respective endpoint.
Specifically, assume I have two endpoints using 2 GB each. If I scale to four workers, I need 2 GB x 2 x 4 = 16 GB.
If I say the first two workers only serve the first endpoint, and the second two workers serve the second endpoint, every process only needs to load one of the models! So I would have 2 GB x 4 = 8 GB. This assumes of course that the load is approximately equal, which is the case here.
Alternatives:
- One option would be a microservice architecture, where each endpoint is its own application. However, this only came up because I am trying to move away from microservices, because I had problems with the reliability of such an architecture. (E.g. need to have some kind of scheduler, need to forward the HTTP endpoints, high latency due to multiple layers of forwarding, and finally some of the endpoints are little more than
return calculation(huge_object[param])
). - The option to share the data among workers does not seem technically possible in the general case
本文标签: Limit FastAPIgunicorn worker to certain endpoints to save memoryStack Overflow
版权声明:本文标题:Limit FastAPIgunicorn... worker to certain endpoints to save memory - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741439571a2378833.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论