admin管理员组

文章数量:1297182

I'm developing an application that allows users to engage in a variety of daily tasks. A pre-trained Large Language Model (LLM) is then fine-tuned using the activity logs that are saved at the end of the day (or when a user signs off).

Because GPU-enabled EC2 instances are expensive per hour, I'm thinking about using the following strategy to cut expenses:

Use a GPU-capable EC2 instance to host the LLM model, but by default, leave the instance switched off. When training is necessary, programmatically launch the instance using an AWS Lambda function. In order to obtain the activity logs and carry out the fine-tuning procedure, the instance would offer an API endpoint. To save money, the instance would be immediately terminated once the training was over. Since my primary skill is programming and I'm not an expert on infrastructure, I'm wondering:

Does this strategy work in real life? Do I need to be mindful of any potential difficulties, such as instance starting time, API design, or optimizing performance on EC2 GPUs? For this use case, are there any other cost-effective approaches (such utilizing spot instances or other AWS services)? I would value any opinions or suggestions from those who have deployed LLMs on EC2 instances or in comparable situations.

本文标签: