admin管理员组

文章数量:1302379

Has anyone adopted BigQuery's generative AI functions in their ETL or ad hoc processes and can speak to the cost applications of using them? We are considering using ml.generate_text to enrich our data, playing around with Gemini 1.5 Flash. The direct integration with BQ would allow for easy application within our existing ETL processes. This is brand new territory for me.

I'm struggling to understand costs and best practices when it comes to reducing them for Gen AI. An example use case would be taking a distinct list of customer email domains to determine which are public (i.e. gmail, yahoo, outlook, etc) and which would be considered private. For this, let's say there are over 100,000 distinct domains to search. Examples I have come across show using ml.generate_text for each row, such as:

with dist_domain as (
  select
    split(email, '@')[safe_offset(1)] as email_domain
  from
    subscriptions_table
  group by
    all
  order by
    count(*) desc
  limit 10 -- while testing
)
select
  prompt,
  email_domain,
  ml_generate_text_llm_result,
from
  ml.generate_text(
    model `bqml-llm-model`, -- using Gemini 1.5 Flash
    (
    select
      concat(
        'Using the following email domain: (',
        email_domain,
        ') provide results in JSON with the following structure: ',
        chr(10),
        '- is_common: BOOLEAN result if the domain is widely used by the public, ',
        'comes from a common email provider, ',
        'or comes from an internet service provider. ',
        'Domains like gmail/yahoo/hotmail/etc are considered public domains. ',
        chr(10),
        'Do not perform any external lookups or searches. Base your decision on your existing knowledge. ',
        'An explanation is not required. Just the formatted results.'
      ) AS prompt,
      email_domain
    from
      dist_domain
    ),
    struct(
      true as flatten_json_output
    )
  );

where the email_domain column is used within the prompt which gets applied to each row. The prompt with gmail returns

{
  "is_common": true
}

Is it common practice to use ml.generate_text against 100k+ rows (or more)? This seems like it would take a long time AND be sending a large volume of prompts. Are costs applied in aggregate against the number of tokens returned across all prompts or are you charged for each prompt? Would BQ costs increase for the increased time it takes to run the query while waiting for Gemini's results?

Has anyone adopted BigQuery's generative AI functions in their ETL or ad hoc processes and can speak to the cost applications of using them? We are considering using ml.generate_text to enrich our data, playing around with Gemini 1.5 Flash. The direct integration with BQ would allow for easy application within our existing ETL processes. This is brand new territory for me.

I'm struggling to understand costs and best practices when it comes to reducing them for Gen AI. An example use case would be taking a distinct list of customer email domains to determine which are public (i.e. gmail, yahoo, outlook, etc) and which would be considered private. For this, let's say there are over 100,000 distinct domains to search. Examples I have come across show using ml.generate_text for each row, such as:

with dist_domain as (
  select
    split(email, '@')[safe_offset(1)] as email_domain
  from
    subscriptions_table
  group by
    all
  order by
    count(*) desc
  limit 10 -- while testing
)
select
  prompt,
  email_domain,
  ml_generate_text_llm_result,
from
  ml.generate_text(
    model `bqml-llm-model`, -- using Gemini 1.5 Flash
    (
    select
      concat(
        'Using the following email domain: (',
        email_domain,
        ') provide results in JSON with the following structure: ',
        chr(10),
        '- is_common: BOOLEAN result if the domain is widely used by the public, ',
        'comes from a common email provider, ',
        'or comes from an internet service provider. ',
        'Domains like gmail/yahoo/hotmail/etc are considered public domains. ',
        chr(10),
        'Do not perform any external lookups or searches. Base your decision on your existing knowledge. ',
        'An explanation is not required. Just the formatted results.'
      ) AS prompt,
      email_domain
    from
      dist_domain
    ),
    struct(
      true as flatten_json_output
    )
  );

where the email_domain column is used within the prompt which gets applied to each row. The prompt with gmail returns

{
  "is_common": true
}

Is it common practice to use ml.generate_text against 100k+ rows (or more)? This seems like it would take a long time AND be sending a large volume of prompts. Are costs applied in aggregate against the number of tokens returned across all prompts or are you charged for each prompt? Would BQ costs increase for the increased time it takes to run the query while waiting for Gemini's results?

Share Improve this question edited Feb 11 at 11:15 desertnaut 60.4k32 gold badges152 silver badges180 bronze badges asked Feb 10 at 16:28 alpacafonduealpacafondue 3911 gold badge6 silver badges19 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

Based on the documentation for pricing BigQuery ML remote model training, inference, and tuning costs are incurred remote model types for this case Gemini 1.5 Flash which is under Google models hosted on Vertex AI and the inference function which is the ML.GENERATE_TEXT.

Other Gemini models such as Gemini 1.5 Flash are billed based on modalities such as characters, images, video/audio seconds. Text input is charged by every 1,000 characters of input (prompt) and every 1,000 characters of output (response). Characters are counted by UTF-8 code points and white space is excluded from the count, resulting in approximately 4 characters per token. Prediction requests that lead to filtered responses are charged for the input only. At the end of each billing cycle, fractions of one cent ($0.01) are rounded to one cent.

The key to cost-effective processing with generative AI is batching. Instead of sending 100,000 individual prompts, try to group multiple email domains into a single prompt. If your use case involves handling large datasets and doesn’t require immediate feedback, batch processing will be more efficient.

Batch prediction is a valuable technique for applying machine learning models to large datasets efficiently. Instead of processing individual data points, you can submit a batch of data to Gemini for prediction, saving time and computational resources.

本文标签: artificial intelligenceBigQueryGenerative AI How Are Costs AppliedStack Overflow