admin管理员组

文章数量:1279008

I have a dictionary with many entries 65k: key => [value, value2,..]

The problem with json that it takes time to deserialize and find a key every time some input comes.

The dictionary is static and won't change in the future.

Is there any way to embed this deserialized dictionary inside an executable in the type of a memory map?

I tried embedding it directly in a module inside a HashMap, but since there are so many keys it takes an infinity to compile and it's not desirable for a library crate.

Could possibly a static variable solve the issue. If yes, how would I check if the static variable is already declared?

I have a dictionary with many entries 65k: key => [value, value2,..]

The problem with json that it takes time to deserialize and find a key every time some input comes.

The dictionary is static and won't change in the future.

Is there any way to embed this deserialized dictionary inside an executable in the type of a memory map?

I tried embedding it directly in a module inside a HashMap, but since there are so many keys it takes an infinity to compile and it's not desirable for a library crate.

Could possibly a static variable solve the issue. If yes, how would I check if the static variable is already declared?

Share Improve this question edited Feb 24 at 9:46 vlad asked Feb 24 at 9:33 vladvlad 635 bronze badges 5
  • If you're providing a library you shouldn't be concerned with embedding data in an executable. What functionality does your library provide? If it's functionality on this dictionary, I'd just expose an appropriate struct with the respective methods and let the library user load the data from an external source on demand as they please, e.g. by embedding it inside a LazyStatic or similar. – Richard Neumann Commented Feb 24 at 9:44
  • @RichardNeumann Hallo, the library itself is pretty basic. It will give an output based on hexademical hash. To find this hash we gotta check the pre-defined dictionary,. It won't be able to provide a custom one. I need a way to represent deserialized version. On my machine it takes third of a second to deserialize (bottleneck) and find a key from 65k entries, which is drastically slow. – vlad Commented Feb 24 at 9:52
  • 3 Maybe you could use phf crate. It allows creation of static hash maps at compile time, with optimised hash function. – Aleksander Krauze Commented Feb 24 at 9:55
  • 2 Seconding the use of phf, 64k entries should be well within its capabilities. Alternatively you could generate a static sorted slice and perform binary searches on that. Alternatively to the alternative, if you can map your "hexadecimal hash" to a table index you can use direct indexing into a LUT (LookUp Table). – Masklinn Commented Feb 24 at 10:06
  • You might want to consider a file based key-value-store, or maybe a sqlite database. Whether you embed it or not shouldn't actually make much of a difference, both have to be loaded to memory. – cafce25 Commented Feb 24 at 10:23
Add a comment  | 

1 Answer 1

Reset to default 4

One option would be using LazyLock:

use std::sync::LazyLock;
use std::collections::HashMap;

static DICT: LazyLock<HashMap<String, String>> = LazyLock::new(|| {
    let data = include_str!("filename.json");
    let dict = todo!("Parse dictionary here");
    dict
});

This will run the code inside the initialiser on the first access, meaning only the first access will be bottlenecked.

You could include the data using include_str or include_bytes.

Another option is to have a build.rs file or some other script that processes the data and outputs a Rust file containing the processed HashMap.

本文标签: rustBest way to embed a big static dictionary in memory mapStack Overflow