See Also: UnixEncoding Members
Unix filenames are an interesting construct, as there is no encoding. The operating system kernel only maintains a sequence of bytes for the filename, with no encoding implied. This makes it non-trivial (or impossible) to determine what encoding a filename is in -- it could be UTF-8, ASCII, Shift-JIS, or some binary data inserted by a freak touch(1) accident (try touch "$(printf "test\xffname")" within a bash(1) shell for an example).
On the other hand, developers and users expect filenames to be strings, and the string type is a UTF-16 encoded string. This consequently requires that all filesystem byte sequences be converted into some UTF-16 encoded string so that files can be used sensibly.
When unmarshaling a filename from unmanaged to managed code (such as with Mono.Unix.Native.Syscall.readdir), UnixEncoding will first try to decode the string as a UTF-8 string.
The upshot to all this is that Mono.Unix and Mono.Unix.Native can list, access, and open all files on your filesystem, regardless of encoding.
The downside is that all such support is only within the Mono.Unix and Mono.Unix.Native namespaces. You won't be able to pass non-Unicode filenames as command-line arguments.
In short, it's a Glorious Hack. Rejoice. Or something.
What this means:
You may not be able to use the System.IO classes to use a file obtained via Mono.Unix and Mono.Unix.Native classes. This is because System.IO doesn't know about UnixEncoding and the escape mechanism it uses. I don't consider this to be a problem, as the System.IO classes couldn't open these files anyway -- they weren't returned by System.IO.Directory.GetFiles, and they were effectively invisible to normal Mono programs. They still are.
If the filename contains Mono.Unix.UnixEncoding.EscapeByte, then you won't be able to use System.IO with that file. If the filename doesn't contain EscapeByte, it can be used with System.IO.
Questions & Answers
|Q||Why UTF-8? Why not use System.Text.Encoding.Default?|
|A||Because UTF-8 is sane and should always be used. :-)|
Ha ha only serious. Plus, since a directory can contain files in more than one encoding, and expecting the developer to provide the right encoding for each file would require the developer to be clairvoyant.
Plus, using UTF-8 allows any Unicode character to be used in a filename (which could be considered as a bad thing, depending).
|Q||What is UnixEncoding.EscapeByte?|
|A||U+0000. Since this is the terminating null, it by definition cannot appear within a Unix filename, so it's a sane choice.|
|Q||Why not use byte instead of stem.Strings for filenames in Mono.Unix.Native.Syscall.open, Mono.Unix.Native.Syscall.stat, etc.?|
Because byte is fugly to work with, so it would need to be offered in addition to the string versions, which would double all the file-related APIs. Do you really want to explain the difference between these APIs?
public static int open (string pathname, OpenFlags flags); public static int open (byte pathname, OpenFlags flags);
(Hint: if you do want to explain the difference between these you're masochistic.)
Furthermore, what should Mono.Unix.Native.Dirent.d_name be (or Mono.Unix.Native.Fstab.fs_file, or any other string-typed structure member)? If it's a byte, developers will still need a way to convert it to a string for debugging and display to the user, but the developer can't know what encoding to use (it could be anything), so this becomes an impossible problem. UnixEncoding may be a Glorious Hack, but at least it leaves the API usage unambiguous.
|Q||.NET doesn't have these limitations! Why does Mono?|
|A||Because Windows stores all filenames on disk as Unicode (and has since Windows NT 3.1 and/or the introduction of Long Filenames in Windows 95), so it doesn't need to worry (as much) about the arbitrary filename encoding problem. Short filenames might be in a local encoding, but CIFS uses Unicode, so you can't be accessing non-Unicode filenames over a network share.|
|Q||Why doesn't Mono do this (or something like it) so that System.IO can read and process all files?|
Plus, I thought it would be easy for Mono to do this, but after implementing this type I'm not sure the other maintainers would wish to deal with the issues of arbitrary filename encodings.
Plus, most current Linux distros default to using UTF-8 already, so (hopefully) this won't be an issue for too much longer (10 years?).