String
BasisEvery program that interacts with users handles text — file paths, error messages, names, commands. In most high-level languages, strings are their own built-in type with special syntax baked deep into the compiler. Zig takes a different route: a string is simply a slice of bytes, and that plain foundation turns out to explain a surprising amount of what strings can and cannot do.
A string is []const u8
You know from Array that a slice ([]T) is a pointer to a block of elements paired with a runtime length. A string in Zig is nothing more than a slice of u8 bytes where those bytes represent text:
const greeting: []const u8 = "hello";
The const in []const u8 means you cannot modify the individual bytes through this slice — you can read them but not write to them. A mutable slice []u8 would allow modification; more on that later.
That’s it. There is no special String type, no hidden structure, no extra metadata. A Zig string is a view into a sequence of bytes, and every string operation is an operation on a slice of u8.
String literals and where they live
When you write "hello" in source code, the compiler stores those five bytes in the program’s data region — the segment of memory that holds compile-time constants (described in Memory Layout). This region is loaded once when the program starts and stays there until it exits.
The precise type of a string literal is *const [N:0]u8 — a pointer to a compile-time array of N bytes with a sentinel of 0 at the end. The :0 notation means the array is null-terminated: the compiler places a zero byte immediately after the last character, even though that byte is not counted in N.
"hello" in memory:
address: 4000 4001 4002 4003 4004 4005
┌─────┬─────┬─────┬─────┬─────┬─────┐
byte: │ 104 │ 101 │ 108 │ 108 │ 111 │ 0 │
└─────┴─────┴─────┴─────┴─────┴─────┘
'h' 'e' 'l' 'l' 'o' null
The null terminator is a convention inherited from the C programming language, where strings had no separate length field and code found the end of a string by scanning for the zero byte. Zig keeps the null terminator for interoperability with C libraries, but Zig code itself typically works with slices that carry their own .len — there is no need to scan for the end.
In practice, a *const [5:0]u8 coerces silently to []const u8 wherever you use it, so you rarely see the longer type in everyday code:
const std = @import("std");
pub fn main() void {
const s: []const u8 = "hello"; // *const [5:0]u8 coerces to []const u8
std.debug.print("length: {}\n", .{s.len}); // length: 5
}
.len returns the number of bytes, not the null terminator. A five-character ASCII string has .len == 5.
Printing strings: {s} vs {}
You have been using {} to print numbers and booleans. For strings, use {s} — the “string” format specifier:
const std = @import("std");
pub fn main() void {
const name: []const u8 = "Zig";
std.debug.print("Hello, {s}!\n", .{name}); // Hello, Zig!
}
Using {} on a []const u8 would print the slice as a raw sequence of integers rather than readable text. Always use {s} when you intend to print human-readable text.
Bytes and characters: ASCII
For ordinary English text, every character maps to exactly one byte. The correspondence is defined by ASCII (American Standard Code for Information Interchange), a 128-entry table that assigns a number between 0 and 127 to each letter, digit, and common symbol:
'A' → 65 'a' → 97 '0' → 48 ' ' → 32
'B' → 66 'b' → 98 '1' → 49 '!' → 33
In ASCII, .len equals the number of visible characters, and you can treat each byte as exactly one character:
const std = @import("std");
pub fn main() void {
const word: []const u8 = "Rust";
for (word) |byte| {
std.debug.print("{c} = {d}\n", .{ byte, byte });
// {c} formats a u8 as a character; {d} formats it as a decimal number
}
// R = 82
// u = 117
// s = 115
// t = 116
}
Bytes and characters: UTF-8
ASCII only covers 128 code points — it has no room for accented letters, Han characters, Arabic script, emoji, or the vast majority of human writing. Modern text uses UTF-8, an encoding that represents every Unicode character while remaining compatible with ASCII.
The key property of UTF-8: characters outside the ASCII range are encoded as multiple bytes. An accented letter might take 2 bytes; a Han character might take 3; an emoji might take 4.
const std = @import("std");
pub fn main() void {
const s: []const u8 = "héllo"; // 'é' is a two-byte UTF-8 sequence
std.debug.print("bytes: {}\n", .{s.len}); // bytes: 6 (not 5!)
}
"héllo" has five visible characters but six bytes. .len reports the byte count, always. If you index into a UTF-8 string with s[i], you get the u8 at position i in the byte sequence — which may be the middle of a multi-byte character, not a meaningful character boundary.
Zig does not hide this complexity behind automatic character abstraction, which is consistent with its design: the language shows you what is actually happening in memory. When you need to iterate over Unicode characters (called code points) rather than raw bytes, use std.unicode.Utf8View:
const std = @import("std");
pub fn main() !void {
const s: []const u8 = "héllo";
const view = try std.unicode.Utf8View.init(s);
var iter = view.iterator();
while (iter.nextCodepoint()) |cp| {
std.debug.print("U+{X:0>4}\n", .{cp});
}
// U+0068 (h)
// U+00E9 (é)
// U+006C (l)
// U+006C (l)
// U+006F (o)
}
For most programs that deal with English or ASCII-only input, working directly with bytes is fine. Whenever the input may contain non-ASCII text, be deliberate: decide whether you need bytes or characters, and choose the right tool.
Slicing a substring
Because a string is a slice, you can extract a substring using the range syntax you already know:
const std = @import("std");
pub fn main() void {
const sentence: []const u8 = "hello world";
const word: []const u8 = sentence[0..5]; // bytes 0, 1, 2, 3, 4
std.debug.print("{s}\n", .{word}); // hello
}
sentence[0..5] produces a new []const u8 that points into the same underlying bytes — no copy is made. As with all slice operations, Zig checks the bounds at runtime in Debug and ReleaseSafe builds.
Comparing strings
The == operator compares pointers, not contents. Two separate slices containing identical bytes will still compare as unequal with ==, because they point to different locations in memory. To test whether two strings have the same content, use std.mem.eql:
const std = @import("std");
pub fn main() void {
const a: []const u8 = "hello";
const b: []const u8 = "hello";
const c: []const u8 = "world";
std.debug.print("{}\n", .{std.mem.eql(u8, a, b)}); // true
std.debug.print("{}\n", .{std.mem.eql(u8, a, c)}); // false
}
std.mem.eql(u8, x, y) first checks that the lengths match, then compares each byte. It returns true only if the lengths are equal and every corresponding byte pair is identical.
Compile-time concatenation
Two string literals can be joined at compile time using the ++ operator. The result is itself a compile-time constant:
const std = @import("std");
const greeting = "Hello, " ++ "world!";
pub fn main() void {
std.debug.print("{s}\n", .{greeting}); // Hello, world!
}
++ is only available when both operands are compile-time known. For building strings at runtime — from user input or computed values — you need a different approach.
Building strings at runtime
When the content of a string is only known while the program is running, you cannot use ++. Two standard tools cover the common cases.
Formatting into a fixed buffer
std.fmt.bufPrint formats values into a []u8 that you supply. It returns a sub-slice of that buffer containing exactly the formatted bytes:
const std = @import("std");
pub fn main() !void {
var buf: [64]u8 = undefined;
const result: []u8 = try std.fmt.bufPrint(&buf, "score: {d}", .{42});
std.debug.print("{s}\n", .{result}); // score: 42
}
The buffer must be large enough. bufPrint returns error.NoSpaceLeft if the formatted output would exceed the buffer’s capacity — try propagates that error to the caller. This approach avoids heap allocation entirely, which makes it a good fit for fixed-size output like log lines or fixed-format messages.
A growable string with ArrayList(u8)
When the final length is unknown in advance, use std.ArrayList(u8) — a growable list of bytes that behaves like a dynamically sized string builder:
const std = @import("std");
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
var buf = std.ArrayList(u8).init(allocator);
defer buf.deinit();
try buf.appendSlice("Hello");
try buf.appendSlice(", ");
try buf.appendSlice("world!");
const result: []const u8 = buf.items;
std.debug.print("{s}\n", .{result}); // Hello, world!
}
buf.items exposes the current contents as a []u8. You can also write formatted output directly into an ArrayList(u8) using buf.writer() with std.fmt.format — but appendSlice is enough for most simple cases.
Mutable strings
[]const u8 is a read-only view. A []u8 — without const — lets you modify the bytes in place:
const std = @import("std");
pub fn main() void {
var bytes: [5]u8 = .{ 'h', 'e', 'l', 'l', 'o' };
const s: []u8 = &bytes;
s[0] = 'H'; // modify the first byte through the slice
std.debug.print("{s}\n", .{s}); // Hello
}
String literals cannot be used as []u8 — they live in read-only memory. To get a mutable byte array, declare it as a var local array and take a slice of it, or allocate on the heap with allocator.alloc(u8, n).
Summary
- A Zig string is a
[]const u8— a slice of bytes. There is no separate string type. - String literals have type
*const [N:0]u8(null-terminated, compile-time constant) and coerce silently to[]const u8. They live in the program’s data region. - Print strings with
{s}. Using{}on a byte slice prints raw integers, not text. .lenis the byte count. For ASCII text, bytes equal visible characters. For UTF-8 text, one character may span 2–4 bytes. Usestd.unicode.Utf8Viewto iterate over characters.- Slice a substring with
s[start..end]— no copy, just a new view into the same bytes. - Compare string contents with
std.mem.eql(u8, a, b). The==operator compares pointers, not contents. - Join compile-time strings with
++. For runtime strings, format into a fixed buffer withstd.fmt.bufPrintor build dynamically withstd.ArrayList(u8). []u8(withoutconst) is a mutable byte slice;[]const u8is read-only. String literals are always read-only.