admin管理员组文章数量:1242802
What is the most efficient way to find the latest updated row on a table with a lot of records (hundreds thousands)?
Is it select * from a_table order by updated_at desc limit 1
Now there are 2000 rows on my SurealDB and the query takes 17ms but what about 2 000 000 rows?
What is the most efficient way to find the latest updated row on a table with a lot of records (hundreds thousands)?
Is it select * from a_table order by updated_at desc limit 1
Now there are 2000 rows on my SurealDB and the query takes 17ms but what about 2 000 000 rows?
Share Improve this question asked 2 days ago VKlapanVKlapan 112 bronze badges 1- Why not make it a test case, create a new dummy table and create millions of records and then do the order by/limit 1 query with and without index to see for yourself. – LegacyDev Commented 2 days ago
3 Answers
Reset to default 0If you're optimizing lookups based on updated_at
, adding an index on that column is the easiest solution.
This allows queries like ORDER BY updated_at DESC LIMIT 1
to run much faster by avoiding full table scans.
Note: Keep in mind that while indexes boost read performance, they come with a trade-off. Every time a row is inserted or updated_at changes, the index must be updated as well, which can slow down write operations. The impact depends on your workload—if updates are frequent, this overhead can add up.
The most efficient way is to not search at all!
For 2 million rows, an index should generally be ok, but an index on a modified audit field is problematic if your data has high write rates. Indexes must be updated when records are updated, this means that at the size of the table grows the overheads for updating records and maintaining the index will increase.
Firstly, wait until you get to 2 million rows, then ask the question again, or better yet wait until the query speed is not acceptable. The speed of servicing your queries with or without an index will still be impacted by other factors, try not to prematurely optimize for the sake of it.
If you do need to reduce latency even further you should look into Index Table Pattern. This is a DW concept but it is especially useful in databases where you are tracking transactional data that has high write rates or high volumes of records and need to query the latest or most recent values.
A simple example of this is if you frequently need to query a point in time balance or quantity without reloading the entire record set.
A simple index table pattern can be implemented in SurrealDB using an Event
on the target table.
-- Create a new event whenever a record is changed in a_table
-- Update a row in a_table_index table with the id 'global'
-- This will track the last change over all the records in the table
-- Note: use different IDs to store records for different filtered sets of data
-- Note: use time::now() if you want to store the time instead of
-- using the value on the record.
DEFINE EVENT record_updated ON TABLE a_table THEN (
UPSERT a_table_index:global MERGE {
a_table = $value.id,
updated_at = $after.updated_at
}
);
Now you can query the id from a_table_index:global
record and can access the modified record via the record link as well:
SELECT * FROM ONLY a_table_index:global
In IoT scenarios we are often concerned with reading the value of the most recent telemetry but want to store the historic values as well, index table concept works well for this with 1 record for each sensor holding the most recently reported value, we might not even store the link to the historic record and instead just capture those values that we need for the low latency reads.
What's the precision of the updated_at field? If you have batch imports you might get records with the same update_at timestamp, if the precision is too low, then limit 1 still only gives one row, but not necessarily the latest.
Even if you have no batch imports, the best data type for determinining the last inserted OR updated row is a rowversion field, that's a sequential number unique for the whole table that's not just generated at insert (as used in autoinc primary keys) but also increments with updates automatically. The SurrealDB is not offering this feature, though. And an autoincrementing Id would only do half the job, unless you only care about the insert order and not the last updated record.
In https://surrealdb/docs/surrealql/datamodel/ids#auto-incrementing-ids you find the optimal way to handle determining the last id though:
When dealing with a large number of records, a more performant option is to use a separate record that holds a single value representing the latest ID.
That's followed by a code example to implement such a table. Then having the last Id generated allows to simply query that one record without an order by and limit clause, as you know what ID you want. It'll still require an index on the autoincremented ID to get the fastest fetch of that one record.
本文标签: surrealdbWhat is the most efficient way to find the latest updated rowStack Overflow
版权声明:本文标题:surrealdb - What is the most efficient way to find the latest updated row? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1740097866a2224313.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论