admin管理员组

文章数量:1316845

The rust book says that string literals (e.g. "hello") are string slices (specifically the borrowed form &str).

Now I notice that "hello"[..] and "hello"[1..3] have type equal to the "non-borrowed" form str, and I'm wondering why that could be.

I have read the documentation on str but it mostly only mentions &str. In particular it only gives the structure of &str ("A &str is made up of two components: a pointer to some bytes, and a length.") so I don't know if that holds of str too. I can't imagine why the documentation would go out of its way to specify &str here if this structure actually held for both forms.

Furthermore, it would seem to me that if a is a pointer p to some bytes, and a length l then a[s..e] should just be the pointer to p+s paired with length e-s. Am I wrong here?

The rust book says that string literals (e.g. "hello") are string slices (specifically the borrowed form &str).

Now I notice that "hello"[..] and "hello"[1..3] have type equal to the "non-borrowed" form str, and I'm wondering why that could be.

I have read the documentation on str but it mostly only mentions &str. In particular it only gives the structure of &str ("A &str is made up of two components: a pointer to some bytes, and a length.") so I don't know if that holds of str too. I can't imagine why the documentation would go out of its way to specify &str here if this structure actually held for both forms.

Furthermore, it would seem to me that if a is a pointer p to some bytes, and a length l then a[s..e] should just be the pointer to p+s paired with length e-s. Am I wrong here?

Share Improve this question edited Jan 30 at 6:55 tarski asked Jan 29 at 0:11 tarskitarski 2491 silver badge12 bronze badges
Add a comment  | 

2 Answers 2

Reset to default 2

Many people struggle to understand the distinction between &str and str.

str is the type that represents the actual bytes that make up the string.

&str is a reference (pointer) to such a string.

The hard part is that str is a DST (dynamically sized type), which means that the length is not known from knowing the type alone. This makes str mysterious, because you can only ever use DSTs behind some kind of indirection (pointer). Pointers to DSTs become fat pointers, that is, they have additional data stored next to the pointer (not in the memory the pointer points to)1. str is a "slice-like" DST (as opposed to a trait object DST), so this additional data is the length of the slice.

For example, the string slice a = "Hello" would be represented like this2 (assuming 32 bit addresses and little-endian numbers):

                length of str
                vvvvvvvvvvv
 a: 12 34 56 78 05 00 00 00  < &str
    ^^^^^^^^^^^
    address of str
 address 12 34 56 78: 48 65 6C 6C 6F  < str
                      ^^^^^^^^^^^^^^
                      bytes of "Hello" encoded as UTF-8

Furthermore, it would seem to me that if a is a pointer p to some bytes, and a length l then a[s..e] should just the pointer p+s and with length e-s. Am I wrong here?

a[s..e] is the indexing operator applied to a string slice &str and a range Range<usize>.

As the docs for the Index trait explains, a[s..e] (in immutable contexts) is desugared as *a.index(s..e)3.

Note the dereference operator! The indexing operator will return a place expression, not a pointer. That is why you see str as return type.

But as I mentioned above, you can't store DSTs directly in variables, so the following will not work:

let x: str = a[1..3];

Instead you'll have to re-reference it:

let y: &str = &a[1..3];
//            ^

This y will be the pointer you described.

As for why the indexing operator is defined to perform a dereference, I can't give a definitive reason, but that's the way it works in C too.

1 This is what enables slicing: if the length was stored in the pointed-to memory, a pointer to the middle of the string could not store the length.
2 The layout of &str (the order of the pointer and the length) is not guaranteed, do not write unsafe code that relies on it.
3 a[s..e] will be desugared as *<str as Index<Range<usize>>>::index(s..e). Index<I> is implemented for str where I: SliceIndex, which is a std-internal trait to allow generic indexing of slices. SliceIndex is implemented for Range<usize> as well as usize. <str as Index<Range<usize>>::Output is <Range<usize> as SliceIndex>::Output , which is str. The index method returns &Self::Output, which will be &str. Then that is what will be dereferenced to str again.

str doesn't really exist as a concrete thing. Its a virtual conceptual value. You can't actually have a concrete representation of a value of type str. So why have str? Because it fits into the rest of the type system. You can pass it to generic type parameters that are expecting something they can build a reference to. All of the rest of the type system that allows you to build generic functions for working with references to things, also work with slices of strings.

If you want to think of str as representing something, its part of the family of dynamically sized types, representing the actual borrowed section of the buffer. In the example "hello"[1..3] the str value would be the bytes e, l, l, within the greater buffer. This is something that doesn't really have enough static information on its own to know what the value is. You need some sort of dynamic tracking, usually in the form of a fat pointer to track that information. However, as a concept this is something that is useful for maintaining a uniform type system and for further developments in how allocations are managed in advanced scenarios.

本文标签: rustType of slicing a sliceStack Overflow