Re: walk_dir and Chinese filenames
- Posted by jimcbrown (admin) Sep 30, 2009
- 1576 views
What I really wrote for was to ask: what's being done about dir()s inability to properly read Chinese filenames? Where I work (as IT Dept. Manager) I get a mix of English and Chinese filenames and its simply not good enough when dir() store question marks for Chinese characters. For example there's file called "组合 1.pdf" (Portfolio 1.pdf). It gets stored as "?? 1.pdf".
This doesn't help you at all, but I have no problems using UTF-8 encoded filenames that consist only of chinese characters. Of course, I'm using Linux.
The machine_func() implementation of M_DIR uses Watcom's readdir() call to get the list of directory entries. I haven't looked at the Watcom docs on this but my guess is that readdir() ends up calling the ANSI version of the W32API, which is why the hanzi ends up converted into question marks.
Now it may have to do with the fact that dir() uses machine_func() rather than the Win32 API call. Maybe I should write my own version of std/filesys.e, replacing all the machine_func()s with API calls. Or maybe keeping the old stuff but putting in ifdefs to catch a windows compile.
Probably this is the easiest way to go. As long as you are using the Unicode functions and are careful to convert the Unicode strings into sequences and back correctly, you shouldn't have any problems. (Note that puts() and printf() don't support UTF-16, so you'll need to wrap more W32API functions if you want to actually display the file names on the console or write them to a file.)

