admin管理员组

文章数量:1350033

In MySQL, it's possible to do something like:

-- imagine there are 100 columns
SELECT * FROM table GROUP BY col1, col2 

I use this feature so much, without having to write out FIRST or MIN or whatever aggregate function would make it deterministic. Anyways, what's the simplest way to do this in BigQuery?

In MySQL, it's possible to do something like:

-- imagine there are 100 columns
SELECT * FROM table GROUP BY col1, col2 

I use this feature so much, without having to write out FIRST or MIN or whatever aggregate function would make it deterministic. Anyways, what's the simplest way to do this in BigQuery?

Share Improve this question edited Apr 2 at 0:40 Dale K 27.5k15 gold badges58 silver badges83 bronze badges asked Apr 1 at 23:11 David542David542 111k206 gold badges569 silver badges1k bronze badges 5
  • 3 Thats a quirk of MySQL and not part of the SQL standard, and not supported by (most) other RDBMS. And personally I think it makes far more sense to write a query which is specific about what data to return. – Dale K Commented Apr 2 at 0:43
  • 2 What you call a feature is a stupid bug in my opinion which should not be possible. The purpose of SQL is that you write a query to fetch specific data from a table. In order to do this, you need to know which data you want. – Jonas Metzler Commented Apr 2 at 4:28
  • Perhaps row_number() over (partition by col1, col2)? – jarlh Commented Apr 2 at 6:50
  • Tip of the day - in MySQL, enable ONLY_FULL_GROUP_BY! – jarlh Commented Apr 2 at 7:07
  • @JonasMetzler being that as it may, I often find it useful. Similar to getting a preview of data, I suppose that is also not 'deterministic' and probably also unsupported by the SQL standard? Such as: with tbl as (select 1 union all select 2) select * from tbl limit 1. – David542 Commented 2 days ago
Add a comment  | 

2 Answers 2

Reset to default 1

You can define yourself a template which

  • uses information_schema.columns to introspect your tables,
  • construct a query,
  • and play it using execute immediate:
EXECUTE IMMEDIATE
(
    WITH tc AS (SELECT 'population_by_zip_2010' t, ['geo_id','zipcode'] c) -- ← When invoking you just have to mention your table name and group by columns here.
    SELECT CONCAT
    (
        'SELECT ', STRING_AGG(CASE WHEN column_name IN UNNEST(c) THEN column_name ELSE CONCAT('MIN(', column_name, ')') END),
        ' FROM `bigquery-public-data`.census_bureau_usa.', table_name, -- ← The schema here has to be adapted once for all.
        ' GROUP BY ', ARRAY_TO_STRING(c, ',')
    ) q
    FROM tc, `bigquery-public-data`.census_bureau_usa.INFORMATION_SCHEMA.COLUMNS -- ← as well as here.
    WHERE t = table_name
    GROUP BY table_name, c
);

/!\ Beware that each query on information_schema is billed 10 MB.

Theorically, you may even be able to create a procedure from it (theory thanks to this SO answer, but that I cannot test due to restrictions on Big Query playground):

CREATE OR REPLACE PROCEDURE selstargroupby(t STRING, c ARRAY<STRING>)
BEGIN
EXECUTE IMMEDIATE
(
    -- No need of table tc here, its contents are the procedure's parameters.
    SELECT CONCAT
    …
    GROUP BY table_name, c
);
END;

CALL selstargroupby('population_by_zip_2010', ['geo_id','zipcode']);

In BigQuery, SQL syntax is stricter compared to MySQL when using GROUP BY. In MySQL, you can use SELECT * in combination with GROUP BY, but BigQuery requires that any column in the SELECT clause be either:

  1. Part of the GROUP BY clause or

  2. Used in an aggregate function.

So, if you want to select all columns from a table and group by certain columns (e.g., col1, col2), you cannot use SELECT * without applying aggregate functions to the other columns.

Check the query below

Use Aggregate Functions with GROUP BY

SELECT 
  col1,
  col2,
  ANY_VALUE(col3) AS col3,
  ANY_VALUE(col4) AS col4,
  -- Continue with other columns
FROM
  your_table
GROUP BY
  col1, col2;

本文标签: sqlSELECT * with GROUP BY col(s)Stack Overflow