Schism Tracker, Unicode, and you

[Paper], 09 Jun 2024
Recently I've taken on adding real Unicode-awareness to Schism, and it was surprisingly easy, to say the least.

I was expecting to have to convert lots of things to be real Unicode, but nope! All that really needed to be done was to convert UTF-8 to CP437 where necessary to actually *draw* the data while keeping the internal form pure UTF-8, and then bundle everything up into a neat macro to keep everything consistent:
#define CHARSET_EASY_MODE_EX(MOD, in, inset, outset, x) \
	do { \
		MOD uint8_t* out; \
		charset_error_t err = charset_iconv(in, (uint8_t**)&out, inset, outset); \
		if (err) \
			out = in; \
	\
		x \
	\
		if (!err) \
			free((uint8_t*)out); \
	} while (0)
I just shoved this macro anywhere necessary and it works perfectly fine for loading any Unicode path. For example, the Spanish word "maƱana" gets displayed correctly now:


The file sorting algorithms were a different beast though, and even now strverscmp doesn't have a real charset-independent variant. For strcasecmp, I had to implement (simple) Unicode case folding, which meant having a switch statement that is almost 1500 lines long and takes up about 20K of space in the binary.

Schism currently does not do any Unicode normalization when comparing strings. This is primarily a problem with decomposed strings (which will likely not get converted properly), though with filenames that probably shouldn't exist anyway...

anyway, Unicode is easy, if you can't use it properly it's a skill issue :p
Now playing: Holy Fuck - LP