admin管理员组文章数量:1316845
The rust book says that string literals (e.g. "hello") are string slices (specifically the borrowed form &str
).
Now I notice that "hello"[..] and "hello"[1..3] have type equal to the "non-borrowed" form str
, and I'm wondering why that could be.
I have read the documentation on str
but it mostly only mentions &str
. In particular it only gives the structure of &str
("A &str is made up of two components: a pointer to some bytes, and a length.") so I don't know if that holds of str
too. I can't imagine why the documentation would go out of its way to specify &str
here if this structure actually held for both forms.
Furthermore, it would seem to me that if a
is a pointer p
to some bytes, and a length l
then a[s..e]
should just be the pointer to p+s
paired with length e-s
. Am I wrong here?
The rust book says that string literals (e.g. "hello") are string slices (specifically the borrowed form &str
).
Now I notice that "hello"[..] and "hello"[1..3] have type equal to the "non-borrowed" form str
, and I'm wondering why that could be.
I have read the documentation on str
but it mostly only mentions &str
. In particular it only gives the structure of &str
("A &str is made up of two components: a pointer to some bytes, and a length.") so I don't know if that holds of str
too. I can't imagine why the documentation would go out of its way to specify &str
here if this structure actually held for both forms.
Furthermore, it would seem to me that if a
is a pointer p
to some bytes, and a length l
then a[s..e]
should just be the pointer to p+s
paired with length e-s
. Am I wrong here?
2 Answers
Reset to default 2Many people struggle to understand the distinction between &str
and str
.
str
is the type that represents the actual bytes that make up the string.
&str
is a reference (pointer) to such a string.
The hard part is that str
is a DST (dynamically sized type), which means that the length is not known from knowing the type alone. This makes str
mysterious, because you can only ever use DSTs behind some kind of indirection (pointer). Pointers to DSTs become fat pointers, that is, they have additional data stored next to the pointer (not in the memory the pointer points to)1. str
is a "slice-like" DST (as opposed to a trait object DST), so this additional data is the length of the slice.
For example, the string slice a = "Hello"
would be represented like this2 (assuming 32 bit addresses and little-endian numbers):
length of str
vvvvvvvvvvv
a: 12 34 56 78 05 00 00 00 < &str
^^^^^^^^^^^
address of str
address 12 34 56 78: 48 65 6C 6C 6F < str
^^^^^^^^^^^^^^
bytes of "Hello" encoded as UTF-8
Furthermore, it would seem to me that if
a
is a pointerp
to some bytes, and a lengthl
thena[s..e]
should just the pointerp+s
and with lengthe-s
. Am I wrong here?
a[s..e]
is the indexing operator applied to a string slice &str
and a range Range<usize>
.
As the docs for the Index
trait explains, a[s..e]
(in immutable contexts) is desugared as *a.index(s..e)
3.
Note the dereference operator! The indexing operator will return a place expression, not a pointer. That is why you see str
as return type.
But as I mentioned above, you can't store DSTs directly in variables, so the following will not work:
let x: str = a[1..3];
Instead you'll have to re-reference it:
let y: &str = &a[1..3];
// ^
This y
will be the pointer you described.
As for why the indexing operator is defined to perform a dereference, I can't give a definitive reason, but that's the way it works in C too.
1 This is what enables slicing: if the length was stored in the pointed-to memory, a pointer to the middle of the string could not store the length.
2 The layout of &str
(the order of the pointer and the length) is not guaranteed, do not write unsafe code that relies on it.
3 a[s..e]
will be desugared as *<str as Index<Range<usize>>>::index(s..e)
. Index<I>
is implemented for str
where I: SliceIndex
, which is a std-internal trait to allow generic indexing of slices. SliceIndex
is implemented for Range<usize>
as well as usize
. <str as Index<Range<usize>>::Output
is <Range<usize> as SliceIndex>::Output
, which is str
. The index
method returns &Self::Output
, which will be &str
. Then that is what will be dereferenced to str
again.
str
doesn't really exist as a concrete thing. Its a virtual conceptual value. You can't actually have a concrete representation of a value of type str
. So why have str
? Because it fits into the rest of the type system. You can pass it to generic type parameters that are expecting something they can build a reference to. All of the rest of the type system that allows you to build generic functions for working with references to things, also work with slices of strings.
If you want to think of str
as representing something, its part of the family of dynamically sized types, representing the actual borrowed section of the buffer. In the example "hello"[1..3]
the str
value would be the bytes e, l, l, within the greater buffer. This is something that doesn't really have enough static information on its own to know what the value is. You need some sort of dynamic tracking, usually in the form of a fat pointer to track that information. However, as a concept this is something that is useful for maintaining a uniform type system and for further developments in how allocations are managed in advanced scenarios.
本文标签: rustType of slicing a sliceStack Overflow
版权声明:本文标题:rust - Type of slicing a slice - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1742008489a2412443.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论