admin管理员组文章数量:1125064
In Polars / pandas / PyArrow, I can instantiate an object from a dict, e.g.
In [12]: pl.DataFrame({'a': [1,2,3], 'b': [4,5,6]})
Out[12]:
shape: (3, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 4 │
│ 2 ┆ 5 │
│ 3 ┆ 6 │
└─────┴─────┘
Is there a way to do that in DuckDB, without going via pandas / pyarrow / etc.?
In Polars / pandas / PyArrow, I can instantiate an object from a dict, e.g.
In [12]: pl.DataFrame({'a': [1,2,3], 'b': [4,5,6]})
Out[12]:
shape: (3, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 4 │
│ 2 ┆ 5 │
│ 3 ┆ 6 │
└─────┴─────┘
Is there a way to do that in DuckDB, without going via pandas / pyarrow / etc.?
Share Improve this question asked 2 days ago ignoring_gravityignoring_gravity 10.4k7 gold badges41 silver badges86 bronze badges 1- From what I see, replacement scans are only available with certain types - pandas.DataFrame, pyarrow Table, Dataset, RecordBatchReader, Scanner, or NumPy ndarrays etc. You can create relation via some methods, but not from dict directly - relational api. I guess converting dict to json is not something you'd want though.. – roman Commented 2 days ago
1 Answer
Reset to default 2duckdb
features a function duckdb.read_json
which should do this by simply streaming the dict
as a json string, but no combination of its various parameters will make it read that dict
the same way polars
does unfortunately. You can rearrange the dict
to match the structure expected for a "unstructured" json format in their documentation pretty easily and then use the duckdb.read_json
function to load it the same way as polars
, which I think the closest you can get with the library as it currently stands.
Here is a demonstration which shows the polars
interpretation we expected, the naive duckdb
interpretation we probably didn't and the transformation necessary to load it like polars
did:
import polars
import duckdb
import io
import json
# Silent requirement for fsspec
data = {'a': [1,2,3], 'b': [4,5,6]}
polars_data = polars.DataFrame(data)
print('polars\' interpretation:')
print(polars_data)
duckdb_data = duckdb.read_json(io.StringIO(json.dumps(data)))
print('duckdb \'s naive interpretation:')
print(duckdb_data)
def transform_to_duckjson(dictdata: dict):
return [{fieldname: fieldrecords[recordnum] for fieldname, fieldrecords in dictdata.items()} for recordnum in range(len(dictdata[next(iter(dictdata.keys()))]))]
duckdb_data2 = duckdb.read_json(io.StringIO(json.dumps(transform_to_duckjson(data))))
print('duckdb \'s interpretation after adjustment:')
print(duckdb_data2)
Which gives me the following output:
polars' interpretation:
shape: (3, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 4 │
│ 2 ┆ 5 │
│ 3 ┆ 6 │
└─────┴─────┘
duckdb 's naive interpretation:
┌───────────┬───────────┐
│ a │ b │
│ int64[] │ int64[] │
├───────────┼───────────┤
│ [1, 2, 3] │ [4, 5, 6] │
└───────────┴───────────┘
duckdb 's interpretation after adjustment:
┌───────┬───────┐
│ a │ b │
│ int64 │ int64 │
├───────┼───────┤
│ 1 │ 4 │
│ 2 │ 5 │
│ 3 │ 6 │
└───────┴───────┘
As a note: Here is the python api reference (scroll down to "read_csv(..." for the relevant information) which doesn't provide any help at all for using the function unfortunately.
Let me know if you have any questions.
本文标签: duckdbDuckDBPyRelation from Python dictStack Overflow
版权声明:本文标题:duckdb - DuckDBPyRelation from Python dict? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736653928a1946206.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论