admin管理员组
文章数量:1129019

Why?

I am querying data from a MongoDB collection and loading the result into a Polars DataFrame. Depending on the limit filter of the mongo query the operation works or raises the error of the title. I wasn't able to fix it because I can't tell if the issue is with Mongo or with Polars. By the way I'm quite new to Polars.

Context

So this essentially the query I'm running on Python using pymongo==4.5.0:

import datetime as dt
res= mongo_clt.col.db.find(
    filter={
        'createdAt': {
            '$gte': dt.datetime.fromisoformat("2024-09-01")
        },
    },
    projection=[
        "type",
        "checked",
        "status",
        "createdAt",
        ],
    limit=0
)

Note that setting limit=0 is the same as not adding a limit, and thus should query all entries.

Now for reference, between the date 2024-09-01 and today(2025-01-08) I should collect about 4700 rows, which I validated running the query in MongoDB Compass and from loading the response directly to a Pandas dataframe instead a Polars one.

The schema I'm using for variables projected is:

import polars as pl
cols_type = {
    'type':pl.Categorical,
    'checked':pl.Boolean,
    'status':pl.Categorical,
    'createdAt':pl.Datetime('ms')
}

Then the response unpacking is:

df = pl.DataFrame(
    data=res,
    schema_overrides=cols_types,
)

Issue

If I set limit = 100 or even limit = 1000 the operation works and I get a Polars dataframe with 100 (or 1000) rows with the correct types. Now if I raise the limit to say 4000 or simply remove the limit I get the following error:

{
    "name": "TypeError",
    "message": "argument 'schema': 'Object' is not a Polars data type",
    "stack": "---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 df_raq = pl.DataFrame(
      2     data=res,
      3     schema_overrides=cols_types,
      4 )

File ~/Desktop/dev/.venv/lib/python3.10/site-packages/polars/dataframe/frame.py:419, in DataFrame.__init__(self, data, schema, schema_overrides, strict, orient, infer_schema_length, nan_to_null)
    414     self._df = pandas_to_pydf(
    415         data, schema=schema, schema_overrides=schema_overrides, strict=strict
    416     )
    418 elif not isinstance(data, Sized) and isinstance(data, (Generator, Iterable)):
--> 419     self._df = iterable_to_pydf(
    420         data,
    421         schema=schema,
    422         schema_overrides=schema_overrides,
    423         strict=strict,
    424         orient=orient,
    425         infer_schema_length=infer_schema_length,
    426     )
    428 elif isinstance(data, pl.DataFrame):
    429     self._df = dataframe_to_pydf(
    430         data, schema=schema, schema_overrides=schema_overrides, strict=strict
    431     )

File ~/Desktop/dev/.venv/lib/python3.10/site-packages/polars/_utils/construction/dataframe.py:990, in iterable_to_pydf(data, schema, schema_overrides, strict, orient, chunk_size, infer_schema_length)
    988 if not values:
    989     break
--> 990 frame_chunk = to_frame_chunk(values, original_schema)
    991 if df is None:
    992     df = frame_chunk

File ~/Desktop/dev/.venv/lib/python3.10/site-packages/polars/_utils/construction/dataframe.py:963, in iterable_to_pydf.<locals>.to_frame_chunk(values, schema)
    962 def to_frame_chunk(values: list[Any], schema: SchemaDefinition | None) -> DataFrame:
--> 963     return pl.DataFrame(
    964         data=values,
    965         schema=schema,
    966         strict=strict,
    967         orient=\"row\",
    968         infer_schema_length=infer_schema_length,
    969     )

File ~/Desktop/dev/.venv/lib/python3.10/site-packages/polars/dataframe/frame.py:384, in DataFrame.__init__(self, data, schema, schema_overrides, strict, orient, infer_schema_length, nan_to_null)
    375     self._df = dict_to_pydf(
    376         data,
    377         schema=schema,
   (...)
    380         nan_to_null=nan_to_null,
    381     )
    383 elif isinstance(data, (list, tuple, Sequence)):
--> 384     self._df = sequence_to_pydf(
    385         data,
    386         schema=schema,
    387         schema_overrides=schema_overrides,
    388         strict=strict,
    389         orient=orient,
    390         infer_schema_length=infer_schema_length,
    391     )
    393 elif isinstance(data, pl.Series):
    394     self._df = series_to_pydf(
    395         data, schema=schema, schema_overrides=schema_overrides, strict=strict
    396     )

File ~/Desktop/dev/.venv/lib/python3.10/site-packages/polars/_utils/construction/dataframe.py:435, in sequence_to_pydf(data, schema, schema_overrides, strict, orient, infer_schema_length)
    432 if not data:
    433     return dict_to_pydf({}, schema=schema, schema_overrides=schema_overrides)
--> 435 return _sequence_to_pydf_dispatcher(
    436     data[0],
    437     data=data,
    438     schema=schema,
    439     schema_overrides=schema_overrides,
    440     strict=strict,
    441     orient=orient,
    442     infer_schema_length=infer_schema_length,
    443 )

File ~/.pyenv/versions/3.10.12/lib/python3.10/functools.py:889, in singledispatch.<locals>.wrapper(*args, **kw)
    885 if not args:
    886     raise TypeError(f'{funcname} requires at least '
    887                     '1 positional argument')
--> 889 return dispatch(args[0].__class__)(*args, **kw)

File ~/Desktop/dev/.venv/lib/python3.10/site-packages/polars/_utils/construction/dataframe.py:676, in _sequence_of_dict_to_pydf(first_element, data, schema, schema_overrides, strict, infer_schema_length, **kwargs)
    668 column_names, schema_overrides = _unpack_schema(
    669     schema, schema_overrides=schema_overrides
    670 )
    671 dicts_schema = (
    672     _include_unknowns(schema_overrides, column_names or list(schema_overrides))
    673     if column_names
    674     else None
    675 )
--> 676 pydf = PyDataFrame.from_dicts(
    677     data,
    678     dicts_schema,
    679     schema_overrides,
    680     strict=strict,
    681     infer_schema_length=infer_schema_length,
    682 )
    684 # TODO: we can remove this `schema_overrides` block completely
    685 #  once  is fixed
    686 if schema_overrides:

TypeError: argument 'schema': 'Object' is not a Polars data type"
}

My guess is that the infer schema by polars has some sort of issue so I tried setting pl.DataFrame(strict=False) but that didn't have any effect.

Update

From the projected columns, the only one that is not casted explicitly is _id which is always returned. In mongo is of type ObjectId so it be the one being referenced in the raised error above. So I forced to cast it to a pl.String and the result was a *new error raised, being ComputeError:

{
    "name": "ComputeError",
    "message": "could not append value: 677fe3e18f80eb81115eb375 of type: object to the builder; make sure that all rows have the same schema or consider increasing `infer_schema_length`

it might also be that a value overflows the data-type's capacity",
    "stack": "---------------------------------------------------------------------------
ComputeError                              Traceback (most recent call last)
Cell In[3], line 23
      1 res= mongo_clt.col.db.find(
      2     filter={
      3         'createdAt': {
   (...)
     20     limit=0
     21 )
---> 23 df = pl.DataFrame(
     24     data=res,
     25     schema={
     26         '_id':pl.String,
     27         'type':pl.Categorical,
     28         'checked':pl.Boolean,
     29         # 'assetId':pl.String,
     30         'status':pl.Categorical,
     31         'createdAt':pl.Datetime('ms'),
     32         'feedback':pl.Struct
     33     }
     34 )

File ~/Desktop/dev/.venv/lib/python3.10/site-packages/polars/dataframe/frame.py:419, in DataFrame.__init__(self, data, schema, schema_overrides, strict, orient, infer_schema_length, nan_to_null)
    414     self._df = pandas_to_pydf(
    415         data, schema=schema, schema_overrides=schema_overrides, strict=strict
    416     )
    418 elif not isinstance(data, Sized) and isinstance(data, (Generator, Iterable)):
--> 419     self._df = iterable_to_pydf(
    420         data,
    421         schema=schema,
    422         schema_overrides=schema_overrides,
    423         strict=strict,
    424         orient=orient,
    425         infer_schema_length=infer_schema_length,
    426     )
    428 elif isinstance(data, pl.DataFrame):
    429     self._df = dataframe_to_pydf(
    430         data, schema=schema, schema_overrides=schema_overrides, strict=strict
    431     )

File ~/Desktop/dev/.venv/lib/python3.10/site-packages/polars/_utils/construction/dataframe.py:990, in iterable_to_pydf(data, schema, schema_overrides, strict, orient, chunk_size, infer_schema_length)
    988 if not values:
    989     break
--> 990 frame_chunk = to_frame_chunk(values, original_schema)
    991 if df is None:
    992     df = frame_chunk

File ~/Desktop/dev/.venv/lib/python3.10/site-packages/polars/_utils/construction/dataframe.py:963, in iterable_to_pydf.<locals>.to_frame_chunk(values, schema)
    962 def to_frame_chunk(values: list[Any], schema: SchemaDefinition | None) -> DataFrame:
--> 963     return pl.DataFrame(
    964         data=values,
    965         schema=schema,
    966         strict=strict,
    967         orient=\"row\",
    968         infer_schema_length=infer_schema_length,
    969     )

File ~/Desktop/dev/.venv/lib/python3.10/site-packages/polars/dataframe/frame.py:384, in DataFrame.__init__(self, data, schema, schema_overrides, strict, orient, infer_schema_length, nan_to_null)
    375     self._df = dict_to_pydf(
    376         data,
    377         schema=schema,
   (...)
    380         nan_to_null=nan_to_null,
    381     )
    383 elif isinstance(data, (list, tuple, Sequence)):
--> 384     self._df = sequence_to_pydf(
    385         data,
    386         schema=schema,
    387         schema_overrides=schema_overrides,
    388         strict=strict,
    389         orient=orient,
    390         infer_schema_length=infer_schema_length,
    391     )
    393 elif isinstance(data, pl.Series):
    394     self._df = series_to_pydf(
    395         data, schema=schema, schema_overrides=schema_overrides, strict=strict
    396     )

File ~/Desktop/dev/.venv/lib/python3.10/site-packages/polars/_utils/construction/dataframe.py:435, in sequence_to_pydf(data, schema, schema_overrides, strict, orient, infer_schema_length)
    432 if not data:
    433     return dict_to_pydf({}, schema=schema, schema_overrides=schema_overrides)
--> 435 return _sequence_to_pydf_dispatcher(
    436     data[0],
    437     data=data,
    438     schema=schema,
    439     schema_overrides=schema_overrides,
    440     strict=strict,
    441     orient=orient,
    442     infer_schema_length=infer_schema_length,
    443 )

File ~/.pyenv/versions/3.10.12/lib/python3.10/functools.py:889, in singledispatch.<locals>.wrapper(*args, **kw)
    885 if not args:
    886     raise TypeError(f'{funcname} requires at least '
    887                     '1 positional argument')
--> 889 return dispatch(args[0].__class__)(*args, **kw)

File ~/Desktop/dev/.venv/lib/python3.10/site-packages/polars/_utils/construction/dataframe.py:676, in _sequence_of_dict_to_pydf(first_element, data, schema, schema_overrides, strict, infer_schema_length, **kwargs)
    668 column_names, schema_overrides = _unpack_schema(
    669     schema, schema_overrides=schema_overrides
    670 )
    671 dicts_schema = (
    672     _include_unknowns(schema_overrides, column_names or list(schema_overrides))
    673     if column_names
    674     else None
    675 )
--> 676 pydf = PyDataFrame.from_dicts(
    677     data,
    678     dicts_schema,
    679     schema_overrides,
    680     strict=strict,
    681     infer_schema_length=infer_schema_length,
    682 )
    684 # TODO: we can remove this `schema_overrides` block completely
    685 #  once  is fixed
    686 if schema_overrides:

ComputeError: could not append value: 677fe3e18f80eb81115eb375 of type: object to the builder; make sure that all rows have the same schema or consider increasing `infer_schema_length`

it might also be that a value overflows the data-type's capacity"
}

I can't increase the infer_schema_lenght cause I'm already using the full length.
The value 677fe3e18f80eb81115eb375 corresponds to an _id but I could only see that in MongoDB Compass, when I load the response with Pandas I don't find that row.
Could then be this: a value overflows the data-type's capacity ?

本文标签： mongodbTypeError argument 39schema39 39Object39 is not a Polars data typeStack Overflow

版权声明：本文标题：mongodb - TypeError: argument 'schema': 'Object' is not a Polars data type - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1736704381a1948594.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

mongodb - TypeError: argument &#39;schema&#39;: &#39;Object&#39; is not a Polars data type - Stack Overflow

Why?

Context

Issue

Update

更多相关文章

mongodb - TypeError: argument &#39;schema&#39;: &#39;Object&#39; is not a Polars data type - Stack Overflow

发表评论

推荐文章

JavaScript closure inside loops – simple practical example - Stack Overflow

How can I guarantee that my enums definition doesn&#39;t change in JavaScript? - Stack Overflow

javascript - orderBy multiple fields in Angular - Stack Overflow

vue.js - How to reference static assets within vue javascript - Stack Overflow

My shortcode is not working in Contact Form 7 Message Body

热门文章

customization - Rewrite URL custom search query

javascript - Remove duplicate values from JS array - Stack Overflow

javascript - Disableenable an input with jQuery? - Stack Overflow

upload featured image using custom post type

next.js - How to properly handle token expiry with Auth.js (NextAuth v5)? - Stack Overflow

javascript - Remove URL parameters without refreshing page - Stack Overflow

javascript - How to use underscore.js as a template engine? - Stack Overflow

javascript - How To Set A JS object property name from a variable - Stack Overflow

how to use javascript Object.defineProperty - Stack Overflow

permissions - ftp_nlist() and ftp_pwd() warnings

最新文章

Java入门级教学（IDEA的下载与安装与JDK的环境配置）

华硕笔记本电脑用U盘重装windows系统

物理网卡MAC修改器v3.0 - 真实网卡硬件MAC地址修改，重装系统不变！

如何一键安装win7系统(一键安装win7系统步骤)

Windows 11最稳定版本详解

javascript - `Vue3 - Vite` project alias src to @ not working - Stack Overflow

java - BCFIPS provider be positioned at the bottom of the security provider list in non-FIPS environment - Stack Overflow

internet explorer 7 - Debugging JavaScript in IE7 - Stack Overflow

categories - How do we ReWrite HTACCESS to point a catsubcat Product URL, to Cat only?

javascript - Can I add a key prop to a React fragment? - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

mongodb - TypeError: argument 'schema': 'Object' is not a Polars data type - Stack Overflow

mongodb - TypeError: argument 'schema': 'Object' is not a Polars data type - Stack Overflow

How can I guarantee that my enums definition doesn't change in JavaScript? - Stack Overflow