String manipulation
Rust's string type is one of the core primitive types of the language. While
represented by the name str
, the name str
is not actually a valid type in
Rust. Each string must also be decorated with how its ownership. This means that
there are three common kinds of strings in rust:
~str
- This is an owned string. This type obeys all of the normal semantics
of the ~T
types, meaning that it has one, and only one, owner. This
type cannot be implicitly copied, and is moved out of when passed to
other functions.
@str
- This is a managed string. Similarly to @T
, this type can be
implicitly copied, and each implicit copy will increment the
reference count to the string. This means that there is not "true
owner" of the string, and the string will be deallocated when the
reference count reaches 0.
&str
- Finally, this is the borrowed string type. This type of string can
only be created from one of the other two kinds of strings. As the
name "borrowed" implies, this type of string is owned elsewhere, and
this string cannot be moved out of.
As an example, here's a few different kinds of strings.
let owned_string = ~"I am an owned string";
let managed_string = @"This string is garbage-collected";
let borrowed_string1 = "This string is borrowed with the 'static lifetime";
let borrowed_string2: &str = owned_string; // owned strings can be borrowed
let borrowed_string3: &str = managed_string; // managed strings can also be borrowed
From the example above, you can see that rust has 3 different kinds of string literals. The owned/managed literals correspond to the owned/managed string types, but the "borrowed literal" is actually more akin to C's concept of a static string.
When a string is declared without a ~
or @
sigil, then the string is
allocated statically in the rodata of the executable/library. The string then
has the type &'static str
meaning that the string is valid for the 'static
lifetime, otherwise known as the lifetime of the entire program. As can be
inferred from the type, these static strings are not mutable.
Many languages have immutable strings by default, and rust has a particular
flavor on this idea. As with the rest of Rust types, strings are immutable by
default. If a string is declared as mut
, however, it may be mutated. This
works the same way as the rest of Rust's type system in the sense that if
there's a mutable reference to a string, there may only be one mutable reference
to that string. With these guarantees, strings can easily transition between
being mutable/immutable with the same benefits of having mutable strings in
other languages.
let mut buf = ~"testing";
buf.push_char(' ');
buf.push_str("123");
assert_eq!(buf, ~"testing 123");
Rust's string type, str
, is a sequence of unicode codepoints encoded as a
stream of UTF-8 bytes. All safely-created strings are guaranteed to be validly
encoded UTF-8 sequences. Additionally, strings are not null-terminated
and can contain null codepoints.
The actual representation of strings have direct mappings to vectors:
~str
is the same as ~[u8]
&str
is the same as &[u8]
@str
is the same as @[u8]
not_utf8 | |
raw | Unsafe operations |
traits |
CharIterator | External iterator for a string's characters.
Use with the |
CharOffsetIterator | External iterator for a string's characters and their byte offsets.
Use with the |
CharRange | |
CharSplitIterator | An iterator over the substrings of a string, separated by |
CharSplitNIterator | An iterator over the substrings of a string, separated by |
MatchesIndexIterator | An iterator over the start and end indices of the matches of a substring within a larger string |
StrSplitIterator | An iterator over the substrings of a string separated by a given search string |
CharEq | Something that can be used to compare against a character |
OwnedStr | |
Str | Any string that can be represented as a slice |
StrSlice | |
StrVector |
eq | Bytewise string equality |
eq_slice | Bytewise slice equality |
from_byte | Convert a byte to a UTF-8 string |
from_char | Convert a char to a string |
from_chars | Convert a vector of chars to a string |
from_utf16 | Allocates a new string from the utf-16 slice provided |
from_utf8 | Convert a vector of bytes to a new UTF-8 string |
from_utf8_opt | Convert a vector of bytes to a new UTF-8 string, if possible. Returns None if the vector contains invalid UTF-8. |
from_utf8_owned | Consumes a vector of bytes to create a new utf-8 string |
from_utf8_owned_opt | Consumes a vector of bytes to create a new utf-8 string. Returns None if the vector contains invalid UTF-8. |
from_utf8_slice | Converts a vector to a string slice without performing any allocations. |
from_utf8_slice_opt | Converts a vector to a string slice without performing any allocations. |
is_utf16 | Determines if a vector of |
is_utf8 | Determines if a vector of bytes contains valid UTF-8 |
replace | Replace all occurrences of one string with another |
utf16_chars | Iterates over the utf-16 characters in the specified slice, yielding each decoded unicode character to the function provided. |
utf8_char_width | Given a first byte, determine how many bytes are in this UTF-8 character |
with_capacity | Allocates a new string with the specified capacity. The string returned is the empty string, but has capacity for much more. |
AnyLineIterator | An iterator over the lines of a string, separated by either |
ByteIterator | External iterator for a string's bytes.
Use with the |
ByteRevIterator | External iterator for a string's bytes in reverse order.
Use with the |
CharOffsetRevIterator | External iterator for a string's characters and their byte offsets in reverse order.
Use with the |
CharRSplitIterator | An iterator over the substrings of a string, separated by |
CharRevIterator | External iterator for a string's characters in reverse order.
Use with the |
WordIterator | An iterator over the words of a string, separated by an sequence of whitespace |
Prefix searches with a type followed by a colon (e.g.
fn:
) to restrict the search to a given type.
Accepted types are: fn
, mod
,
struct
(or str
), enum
,
trait
, typedef
(or
tdef
).