admin管理员组

文章数量:1410712

I have not managed to found a good enough method to extract the tables and columns. My final intention is to have all the columns used in format table.column. I use python.

I have tried python libraries like sql_metadata, but it is not as precise as I would like. I tried Parser(query).tables and Parser(query).columns

I also have the .sqlite (I am using SQLite) databases if it is needed to execute sql.

For example:

SELECT student_id FROM student_course_attendance WHERE course_id = 301 ORDER BY date_of_attendance DESC LIMIT 1

student_course_attendance.student_id, student_course_attendance.course_id and student_course_attendance.date_of_attendance

And I also want to take into account that if for example I do SELECT(*) I will have to get all the attributes.

I have not managed to found a good enough method to extract the tables and columns. My final intention is to have all the columns used in format table.column. I use python.

I have tried python libraries like sql_metadata, but it is not as precise as I would like. I tried Parser(query).tables and Parser(query).columns

I also have the .sqlite (I am using SQLite) databases if it is needed to execute sql.

For example:

SELECT student_id FROM student_course_attendance WHERE course_id = 301 ORDER BY date_of_attendance DESC LIMIT 1

student_course_attendance.student_id, student_course_attendance.course_id and student_course_attendance.date_of_attendance

And I also want to take into account that if for example I do SELECT(*) I will have to get all the attributes.

Share Improve this question edited Mar 10 at 19:01 Dale K 27.5k15 gold badges58 silver badges83 bronze badges asked Mar 10 at 12:11 Tuneful13Tuneful13 111 bronze badge 4
  • From what are looking to extract the tables and columns, and then extract them into what? In this question, as much sample data as you can provide would be helpful. – Mike G Commented Mar 10 at 12:17
  • Define 'used'. Do you mean a result-set column? And a condition input column? And a join/exists etc column, without query input/output? What do you want for select c1, ABS(c2 + c3) from t1 where c4 = ? and not exists (select * from t2 where t1.c5 = t2.xc). – jarlh Commented Mar 10 at 12:37
  • In what ways is sql_metadata not as precise as you would like? It seems the perfect solution. – Bart McEndree Commented Mar 10 at 13:23
  • 1 A parser won't be able to tell you what columns SELECT * refers to. It doesn't know the table schema, it just parses the SQL string. – Barmar Commented Mar 10 at 15:21
Add a comment  | 

1 Answer 1

Reset to default 1

Your ability to get this information out of just a sql statement is going to be very limited. In your shared example it is somewhat possible but some assumptions have to be made.

An example using sqlparse:

import sqlparse
from sqlparse.sql import IdentifierList, Identifier
from sqlparse.tokens import Keyword, DML

def is_subselect(parsed):
    if not parsed.is_group:
        return False
    for item in parsed.tokens:
        if item.ttype is DML and item.value.upper() == 'SELECT':
            return True
    return False

def extract_from_part(parsed):
    from_seen = False
    for item in parsed.tokens:
        if from_seen:
            if is_subselect(item):
                yield from extract_from_part(item)
            elif item.ttype is Keyword:
                return
            else:
                yield item
        elif item.ttype is Keyword and item.value.upper() == 'FROM':
            from_seen = True

def extract_table_identifiers(token_stream):
    for item in token_stream:
        if isinstance(item, IdentifierList):
            for identifier in item.get_identifiers():
                yield identifier.get_name()
        elif isinstance(item, Identifier):
            yield item.get_name()        
        elif item.ttype is Keyword: # needed for a sqlparse bug 
            yield item.value


def extract_tables(sql):
    stream = extract_from_part(sqlparse.parse(sql)[0])
    return list(extract_table_identifiers(stream))

sql = 'SELECT student_id FROM student_course_attendance WHERE course_id = 301 ORDER BY date_of_attendance DESC LIMIT 1;'

parsed = sqlparse.parse(sql)
display(parsed[0].tokens)

columns = []
for token in parsed[0].tokens:   
    if isinstance(token, sqlparse.sql.Identifier):
        columns.append(token.get_name())

tables = list(extract_table_identifiers(extract_from_part(parsed[0])))

#it's only possible to determine which table the unqualified columns came
# when there is a single table in the FROM clause.
if len(tables) == 1:
    columns = [tables[0] + '.' + column for column in columns]

print(columns)

This all falls apart as soon as you add in another table to the FROM clause or use a SELECT * where there is simply no way to to determine which column came from which table or what the columns are at all. This also gets very ugly when you add in subqueries or CTEs.

At the end of the day if your only workable solution to a problem is "parsing sql" then rethink how badly you need to solve the problem.

本文标签: pythonExtract columns and tables used given an SQL queryStack Overflow