Mono.Unix.UnixEncoding Class
A Unix filename System.Text.Encoding.

See Also: UnixEncoding Members


public class UnixEncoding : System.Text.Encoding


Unix filenames are an interesting construct, as there is no encoding. The operating system kernel only maintains a sequence of bytes for the filename, with no encoding implied. This makes it non-trivial (or impossible) to determine what encoding a filename is in -- it could be UTF-8, ASCII, Shift-JIS, or some binary data inserted by a freak touch(1) accident (try touch "$(printf "test\xffname")" within a bash(1) shell for an example).

On the other hand, developers and users expect filenames to be strings, and the string type is a UTF-16 encoded string. This consequently requires that all filesystem byte sequences be converted into some UTF-16 encoded string so that files can be used sensibly.

All filenames strings provided to/from the Mono.Unix and Mono.Unix.Native types are passed through UnixEncoding. UnixEncoding does the following:

The upshot to all this is that Mono.Unix and Mono.Unix.Native can list, access, and open all files on your filesystem, regardless of encoding.

The downside is that all such support is only within the Mono.Unix and Mono.Unix.Native namespaces. You won't be able to pass non-Unicode filenames as command-line arguments.

In short, it's a Glorious Hack. Rejoice. Or something.

What this means:

Questions & Answers

QWhy UTF-8? Why not use System.Text.Encoding.Default?
ABecause UTF-8 is sane and should always be used. :-)

Ha ha only serious. Plus, since a directory can contain files in more than one encoding, and expecting the developer to provide the right encoding for each file would require the developer to be clairvoyant.

Plus, using UTF-8 allows any Unicode character to be used in a filename (which could be considered as a bad thing, depending).

QWhat is UnixEncoding.EscapeByte?
A U+0000. Since this is the terminating null, it by definition cannot appear within a Unix filename, so it's a sane choice.
Q Why not use byte[] instead of stem.Strings for filenames in, Mono.Unix.Native.Syscall.stat, etc.?

Because byte[] is fugly to work with, so it would need to be offered in addition to the string versions, which would double all the file-related APIs. Do you really want to explain the difference between these APIs?

C# Example

public static int open (string pathname, OpenFlags flags);
public static int open (byte[] pathname, OpenFlags flags);

(Hint: if you do want to explain the difference between these you're masochistic.)

Furthermore, what should Mono.Unix.Native.Dirent.d_name be (or Mono.Unix.Native.Fstab.fs_file, or any other string-typed structure member)? If it's a byte[], developers will still need a way to convert it to a string for debugging and display to the user, but the developer can't know what encoding to use (it could be anything), so this becomes an impossible problem. UnixEncoding may be a Glorious Hack, but at least it leaves the API usage unambiguous.

Q .NET doesn't have these limitations! Why does Mono?
A Because Windows stores all filenames on disk as Unicode (and has since Windows NT 3.1 and/or the introduction of Long Filenames in Windows 95), so it doesn't need to worry (as much) about the arbitrary filename encoding problem. Short filenames might be in a local encoding, but CIFS uses Unicode, so you can't be accessing non-Unicode filenames over a network share.
Q Why doesn't Mono do this (or something like it) so that System.IO can read and process all files?

Priorities. :-)

Plus, I thought it would be easy for Mono to do this, but after implementing this type I'm not sure the other maintainers would wish to deal with the issues of arbitrary filename encodings.

Plus, most current Linux distros default to using UTF-8 already, so (hopefully) this won't be an issue for too much longer (10 years?).

Thread Safety

This type is safe for multithreaded operations.


Namespace: Mono.Unix
Assembly: Mono.Posix (in Mono.Posix.dll)
Assembly Versions: 1.0.5000.0,,