admin管理员组文章数量:1129019
Why?
I am querying data from a MongoDB collection and loading the result into a Polars DataFrame. Depending on the limit
filter of the mongo query the operation works or raises the error of the title. I wasn't able to fix it because I can't tell if the issue is with Mongo or with Polars. By the way I'm quite new to Polars.
Context
So this essentially the query I'm running on Python using pymongo==4.5.0
:
import datetime as dt
res= mongo_clt.col.db.find(
filter={
'createdAt': {
'$gte': dt.datetime.fromisoformat("2024-09-01")
},
},
projection=[
"type",
"checked",
"status",
"createdAt",
],
limit=0
)
Note that setting limit=0
is the same as not adding a limit, and thus should query all entries.
Now for reference, between the date 2024-09-01 and today(2025-01-08) I should collect about 4700 rows, which I validated running the query in MongoDB Compass and from loading the response directly to a Pandas dataframe instead a Polars one.
The schema I'm using for variables projected is:
import polars as pl
cols_type = {
'type':pl.Categorical,
'checked':pl.Boolean,
'status':pl.Categorical,
'createdAt':pl.Datetime('ms')
}
Then the response unpacking is:
df = pl.DataFrame(
data=res,
schema_overrides=cols_types,
)
Issue
If I set limit = 100
or even limit = 1000
the operation works and I get a Polars dataframe with 100 (or 1000) rows with the correct types.
Now if I raise the limit to say 4000 or simply remove the limit I get the following error:
{
"name": "TypeError",
"message": "argument 'schema': 'Object' is not a Polars data type",
"stack": "---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[5], line 1
----> 1 df_raq = pl.DataFrame(
2 data=res,
3 schema_overrides=cols_types,
4 )
File ~/Desktop/dev/.venv/lib/python3.10/site-packages/polars/dataframe/frame.py:419, in DataFrame.__init__(self, data, schema, schema_overrides, strict, orient, infer_schema_length, nan_to_null)
414 self._df = pandas_to_pydf(
415 data, schema=schema, schema_overrides=schema_overrides, strict=strict
416 )
418 elif not isinstance(data, Sized) and isinstance(data, (Generator, Iterable)):
--> 419 self._df = iterable_to_pydf(
420 data,
421 schema=schema,
422 schema_overrides=schema_overrides,
423 strict=strict,
424 orient=orient,
425 infer_schema_length=infer_schema_length,
426 )
428 elif isinstance(data, pl.DataFrame):
429 self._df = dataframe_to_pydf(
430 data, schema=schema, schema_overrides=schema_overrides, strict=strict
431 )
File ~/Desktop/dev/.venv/lib/python3.10/site-packages/polars/_utils/construction/dataframe.py:990, in iterable_to_pydf(data, schema, schema_overrides, strict, orient, chunk_size, infer_schema_length)
988 if not values:
989 break
--> 990 frame_chunk = to_frame_chunk(values, original_schema)
991 if df is None:
992 df = frame_chunk
File ~/Desktop/dev/.venv/lib/python3.10/site-packages/polars/_utils/construction/dataframe.py:963, in iterable_to_pydf.<locals>.to_frame_chunk(values, schema)
962 def to_frame_chunk(values: list[Any], schema: SchemaDefinition | None) -> DataFrame:
--> 963 return pl.DataFrame(
964 data=values,
965 schema=schema,
966 strict=strict,
967 orient=\"row\",
968 infer_schema_length=infer_schema_length,
969 )
File ~/Desktop/dev/.venv/lib/python3.10/site-packages/polars/dataframe/frame.py:384, in DataFrame.__init__(self, data, schema, schema_overrides, strict, orient, infer_schema_length, nan_to_null)
375 self._df = dict_to_pydf(
376 data,
377 schema=schema,
(...)
380 nan_to_null=nan_to_null,
381 )
383 elif isinstance(data, (list, tuple, Sequence)):
--> 384 self._df = sequence_to_pydf(
385 data,
386 schema=schema,
387 schema_overrides=schema_overrides,
388 strict=strict,
389 orient=orient,
390 infer_schema_length=infer_schema_length,
391 )
393 elif isinstance(data, pl.Series):
394 self._df = series_to_pydf(
395 data, schema=schema, schema_overrides=schema_overrides, strict=strict
396 )
File ~/Desktop/dev/.venv/lib/python3.10/site-packages/polars/_utils/construction/dataframe.py:435, in sequence_to_pydf(data, schema, schema_overrides, strict, orient, infer_schema_length)
432 if not data:
433 return dict_to_pydf({}, schema=schema, schema_overrides=schema_overrides)
--> 435 return _sequence_to_pydf_dispatcher(
436 data[0],
437 data=data,
438 schema=schema,
439 schema_overrides=schema_overrides,
440 strict=strict,
441 orient=orient,
442 infer_schema_length=infer_schema_length,
443 )
File ~/.pyenv/versions/3.10.12/lib/python3.10/functools.py:889, in singledispatch.<locals>.wrapper(*args, **kw)
885 if not args:
886 raise TypeError(f'{funcname} requires at least '
887 '1 positional argument')
--> 889 return dispatch(args[0].__class__)(*args, **kw)
File ~/Desktop/dev/.venv/lib/python3.10/site-packages/polars/_utils/construction/dataframe.py:676, in _sequence_of_dict_to_pydf(first_element, data, schema, schema_overrides, strict, infer_schema_length, **kwargs)
668 column_names, schema_overrides = _unpack_schema(
669 schema, schema_overrides=schema_overrides
670 )
671 dicts_schema = (
672 _include_unknowns(schema_overrides, column_names or list(schema_overrides))
673 if column_names
674 else None
675 )
--> 676 pydf = PyDataFrame.from_dicts(
677 data,
678 dicts_schema,
679 schema_overrides,
680 strict=strict,
681 infer_schema_length=infer_schema_length,
682 )
684 # TODO: we can remove this `schema_overrides` block completely
685 # once is fixed
686 if schema_overrides:
TypeError: argument 'schema': 'Object' is not a Polars data type"
}
My guess is that the infer schema by polars has some sort of issue so I tried setting pl.DataFrame(strict=False)
but that didn't have any effect.
Update
From the projected columns,
the only one that is not casted explicitly is _id
which is always returned.
In mongo is of type ObjectId
so it be the one being referenced in the raised error above.
So I forced to cast it to a pl.String
and the result was a *new error raised, being ComputeError
:
{
"name": "ComputeError",
"message": "could not append value: 677fe3e18f80eb81115eb375 of type: object to the builder; make sure that all rows have the same schema or consider increasing `infer_schema_length`
it might also be that a value overflows the data-type's capacity",
"stack": "---------------------------------------------------------------------------
ComputeError Traceback (most recent call last)
Cell In[3], line 23
1 res= mongo_clt.col.db.find(
2 filter={
3 'createdAt': {
(...)
20 limit=0
21 )
---> 23 df = pl.DataFrame(
24 data=res,
25 schema={
26 '_id':pl.String,
27 'type':pl.Categorical,
28 'checked':pl.Boolean,
29 # 'assetId':pl.String,
30 'status':pl.Categorical,
31 'createdAt':pl.Datetime('ms'),
32 'feedback':pl.Struct
33 }
34 )
File ~/Desktop/dev/.venv/lib/python3.10/site-packages/polars/dataframe/frame.py:419, in DataFrame.__init__(self, data, schema, schema_overrides, strict, orient, infer_schema_length, nan_to_null)
414 self._df = pandas_to_pydf(
415 data, schema=schema, schema_overrides=schema_overrides, strict=strict
416 )
418 elif not isinstance(data, Sized) and isinstance(data, (Generator, Iterable)):
--> 419 self._df = iterable_to_pydf(
420 data,
421 schema=schema,
422 schema_overrides=schema_overrides,
423 strict=strict,
424 orient=orient,
425 infer_schema_length=infer_schema_length,
426 )
428 elif isinstance(data, pl.DataFrame):
429 self._df = dataframe_to_pydf(
430 data, schema=schema, schema_overrides=schema_overrides, strict=strict
431 )
File ~/Desktop/dev/.venv/lib/python3.10/site-packages/polars/_utils/construction/dataframe.py:990, in iterable_to_pydf(data, schema, schema_overrides, strict, orient, chunk_size, infer_schema_length)
988 if not values:
989 break
--> 990 frame_chunk = to_frame_chunk(values, original_schema)
991 if df is None:
992 df = frame_chunk
File ~/Desktop/dev/.venv/lib/python3.10/site-packages/polars/_utils/construction/dataframe.py:963, in iterable_to_pydf.<locals>.to_frame_chunk(values, schema)
962 def to_frame_chunk(values: list[Any], schema: SchemaDefinition | None) -> DataFrame:
--> 963 return pl.DataFrame(
964 data=values,
965 schema=schema,
966 strict=strict,
967 orient=\"row\",
968 infer_schema_length=infer_schema_length,
969 )
File ~/Desktop/dev/.venv/lib/python3.10/site-packages/polars/dataframe/frame.py:384, in DataFrame.__init__(self, data, schema, schema_overrides, strict, orient, infer_schema_length, nan_to_null)
375 self._df = dict_to_pydf(
376 data,
377 schema=schema,
(...)
380 nan_to_null=nan_to_null,
381 )
383 elif isinstance(data, (list, tuple, Sequence)):
--> 384 self._df = sequence_to_pydf(
385 data,
386 schema=schema,
387 schema_overrides=schema_overrides,
388 strict=strict,
389 orient=orient,
390 infer_schema_length=infer_schema_length,
391 )
393 elif isinstance(data, pl.Series):
394 self._df = series_to_pydf(
395 data, schema=schema, schema_overrides=schema_overrides, strict=strict
396 )
File ~/Desktop/dev/.venv/lib/python3.10/site-packages/polars/_utils/construction/dataframe.py:435, in sequence_to_pydf(data, schema, schema_overrides, strict, orient, infer_schema_length)
432 if not data:
433 return dict_to_pydf({}, schema=schema, schema_overrides=schema_overrides)
--> 435 return _sequence_to_pydf_dispatcher(
436 data[0],
437 data=data,
438 schema=schema,
439 schema_overrides=schema_overrides,
440 strict=strict,
441 orient=orient,
442 infer_schema_length=infer_schema_length,
443 )
File ~/.pyenv/versions/3.10.12/lib/python3.10/functools.py:889, in singledispatch.<locals>.wrapper(*args, **kw)
885 if not args:
886 raise TypeError(f'{funcname} requires at least '
887 '1 positional argument')
--> 889 return dispatch(args[0].__class__)(*args, **kw)
File ~/Desktop/dev/.venv/lib/python3.10/site-packages/polars/_utils/construction/dataframe.py:676, in _sequence_of_dict_to_pydf(first_element, data, schema, schema_overrides, strict, infer_schema_length, **kwargs)
668 column_names, schema_overrides = _unpack_schema(
669 schema, schema_overrides=schema_overrides
670 )
671 dicts_schema = (
672 _include_unknowns(schema_overrides, column_names or list(schema_overrides))
673 if column_names
674 else None
675 )
--> 676 pydf = PyDataFrame.from_dicts(
677 data,
678 dicts_schema,
679 schema_overrides,
680 strict=strict,
681 infer_schema_length=infer_schema_length,
682 )
684 # TODO: we can remove this `schema_overrides` block completely
685 # once is fixed
686 if schema_overrides:
ComputeError: could not append value: 677fe3e18f80eb81115eb375 of type: object to the builder; make sure that all rows have the same schema or consider increasing `infer_schema_length`
it might also be that a value overflows the data-type's capacity"
}
- I can't increase the
infer_schema_lenght
cause I'm already using the full length. - The value
677fe3e18f80eb81115eb375
corresponds to an_id
but I could only see that in MongoDB Compass, when I load the response with Pandas I don't find that row. - Could then be this:
a value overflows the data-type's capacity
?
本文标签: mongodbTypeError argument 39schema39 39Object39 is not a Polars data typeStack Overflow
版权声明:本文标题:mongodb - TypeError: argument 'schema': 'Object' is not a Polars data type - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736704381a1948594.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论