Modding FMOD

A couple of years ago I was tooling around with a (still unreleased) mod for Halo 3: ODST. The Halo modding scene cops some flak for being derivative as a lot of mods are just asset ports between games in the series. And this one was no exception; I needed some sounds from Halo 3.

All the Way to the Bank

The MCC ports of both games run their audio through FMOD, a cross-platform audio middleware. The runtime side is what you'd expect: a mixer, effects, 3D positioning, and the usual. The interesting bits for modding are the offline parts. You feed FMOD a pile of source .wav files and out the other end come packed .fsb files, as in FMOD Sound Bank (not to be confused with a certain three-letter agency). In MCC that authoring pass happens inside tool, which is an absolute Swiss Army knife of a CLI:

.\tool.exe sounds-single-layer "data\sound\game_sfx\ui\shield_charge\charge\loop" sfx -bank:h3
# more imports, then rebuild:
.\tool.exe report-sounds "sound"
.\tool.exe export-fmod-banks "reports\reports_00\sounds_report_sizes.csv" pc sfx -bank:h3

Like a lot of tool's workflows, this one's somewhat fraught. Bank corruption is common enough that many a modder has a horror story:

View post by Gashnor on X
Gashnor @GashnorOfficial
Replying to @Kashiiera
I would run one big import when I gathered all the sounds I'd need because it corrupted my banks so much. Ended up getting a decent amount in but yeah, not ideal.
I get why Bungie designed this way for the Xbox 360, but it isn't super scalable
10:08 PM · Sep 3, 2024

In Bungie's defence, this isn't their design. FMOD was bolted on when the games were ported to the Xbox One as part of the MCC. Bungo would never. Traditionally, data in Halo is stored in what are called tag files. Every single game asset is a tag, from warthogs to frag grenades, and yes, sounds too. The sound tags still carry the per-clip gameplay metadata (sound class, playback flags, pitch ranges, and priority) but not the audio itself, which now lives in FMOD banks.

This got me thinking, what if I could copy the tag files as-is and modify ODST's banks to include the necessary sub-sounds, all without going through tool? Avoiding tool would give me two benefits:

it would sidestep any unnecessary re-encodes; and
it would prevent any surprise bank corruption.

Fourth Floor: Headers, Names, Keys to Sub-sounds

So I set about researching all I could about the proprietary .fsb format. I came across python-fsb5 and the venerable vgmstream. Both projects were invaluable references to have for the binary format, but of course each is only concerned with extraction so I would still have to roll my own tool. I also came across FMOD's CEO casually posting the FSB header struct on a support forum (as you do):

struct FSB5_HEADER
{
	char id[4]; // 'FSB5'
	unsigned int subVersion; // Extended FSB version
	int numSubSounds; // Number of sub-sounds in the file
	unsigned int headerChunkSizeBytes; // Size in bytes of all of the sub-sound headers including metadata
	unsigned int namesChunkSizeBytes; // Size in bytes of all the original source file names
	unsigned int dataChunkSizeBytes; // Size in bytes of compressed sample data
	FMOD_FSB_FORMAT dataFormat; // Compression format
	unsigned int dataFormatVersion; // Version number of compression format
	unsigned int mode; // Flags that apply to all sub-sounds in the FSB
	FMOD_UINT64 compatibilityHash; // Deprecated
	FMOD_GUID guid; // MD5 hash based unique identifier using all header information
};

Mapping that to Python is mostly an exercise in writing a struct.unpack format string:

fsb5_header = f.read(60)
(
	id,
	sub_version,
	num_sub_sounds,
	header_chunk_size_bytes,
	names_chunk_size_bytes,
	data_chunk_size_bytes,
	data_format,
	data_format_version,
	mode,
	compatibility_hash,
	guid_data_1,
	guid_data_2,
	guid_data_3,
	guid_data_4,
) = unpack("<4s I i I I I I I I Q I H H 8s", fsb5_header)

You'd be forgiven for thinking that string looks like hieroglyphics. Here's a quick overview covering what I used:

Char	Meaning
`<`	little-endian byte order
`4s`	four bytes
`I`	`unsigned int`
`i`	`int`
`Q`	`unsigned long long`
`H`	`unsigned short`
`8s`	eight bytes

Immediately following that header are the sub-sound headers. Each one consists of a 64-bit integer, containing five aggressively bit-packed fields, optionally followed by a chain of extra fields:

Bits	Field
0	Set if an extra field follows
1-4	Sample rate enum
5	Channel count minus 1
6-33	Offset into the data chunk, in 16-byte units
34-63	Decoded sample count

Each extra field is 32 bits laid out as follows, with data immediately following:

Bits	Field
0	Set if another field follows
1-24	Data size in bytes
25-31	Data type

headers = []
data_offsets = []
for i in range(num_sub_sounds):
	# 8-byte base header, decoded as a 64-bit little-endian int
	headers.append(bytearray(f.read(8)))
	raw = int.from_bytes(headers[i], byteorder="little")
	extra_field = bits(raw, 0,        1)
	frequency   = bits(raw, 1,        4)
	channels    = bits(raw, 1+4,      1)  + 1
	data_offset = bits(raw, 1+4+1,    28) * 16
	samples     = bits(raw, 1+4+1+28, 30)

	# remember the offset so the data chunk can be sliced up later
	data_offsets.append(data_offset)

	# walk any extra fields chained off this sub-sound
	while extra_field:
		headers[i].extend(f.read(4))
		raw = int.from_bytes(headers[i][-4:], byteorder="little")
		extra_field = bits(raw, 0,    1)
		data_size   = bits(raw, 1,    24)
		data_type   = bits(raw, 1+24, 7)

		# data isn't parsed, just kept alongside the header
		headers[i].extend(f.read(data_size))

Next up is the name chunk which, mercifully, is much simpler. First, an array of 32-bit offsets from the start of the chunk to the start of each name's string. Then, the strings themselves—null-terminated and tightly packed.

names_chunk_start = f.tell()
name_offsets = []
names = []
for i in range(num_sub_sounds):
	name_offsets.append(int.from_bytes(f.read(4), byteorder="little"))

# pair each offset with the next so we know where each name ends
for a, b in zip(name_offsets, name_offsets[1:] + [names_chunk_size_bytes]):
	f.seek(names_chunk_start + a)
	names.append(f.read(b - a).rstrip(b'\x00').decode())

And finally the data chunk. Encoded samples for each sub-sound aligned to 16 bytes, with the first on a 32-byte boundary. Not that we concern ourselves with any of that—we just grab everything leading up to the next sub-sound. As above, offsets are relative to the start of the chunk.

sub_sound_data = []
for start, end in zip(data_offsets, data_offsets[1:] + [data_chunk_size_bytes]):
	sub_sound_data.append(f.read(end - start))

Having decomposed the existing bank's chunks into three separate arrays (headers, names, and sub_sound_data), I could trivially cherry-pick by name and construct a brand new .fsb file from scratch. Only one field eluded me…

Who Ordered the Mystery Hash

I recall trying a few things to get guid to fall out, but it didn't matter as ODST never checked it at runtime. I left it zeroed but it always bothered me. So continuing my run of using Claude to answer questions I've left unanswered, I decided to turn to the LLM once more to crack it. All it asked for was a copy of tool.exe (just the binary, not even a decompilation) and then it hit the jackpot:

fsb_header = pack("<4s I i I I I I I I Q 16s",
    id, sub_version, num_sub_sounds,
    header_chunk_size_bytes, names_chunk_size_bytes, data_chunk_size_bytes,
    data_format, data_format_version, mode,
    0, b'\x00' * 16,
)
guid = hashlib.md5(bytes(header_chunk_bytes) + fsb_header).digest()

The sub-sound header chunk first, then the FSB header with compatibilityHash and guid zeroed. Mystery solved.