That's done as part of xattr, or extended attributes. It's a very flexible system. For example you can add comments to a file so they are indexed by Spotlight.
Except NTFS does not have "extended attributes" in Linux/Irix/HPFS sense.
Every FILE object in the database is ultimately (outside of some low level metadata) a map of Type-(optional Name)-Length-Value entries, of which file contents and what people think of as "extended attributes" are just random DATA type entries (empty DATA name marks the default to own when you do file I/O).
It's similar to ZFS (in default config) and Solaris UFS where a file is also a directory
> Except NTFS does not have "extended attributes" in Linux/Irix/HPFS sense.
Except actually NTFS does have "extended attributes" in the HPFS sense, which were added to support the OS/2 subsystem in Windows NT. And went on to be used by other stuff as well, including the POSIX subsystem (and its successors Interix/SFU/SUA) and more recently WSL (at least WSL1, not sure about WSL2), for storage of POSIX file metadata.
In NTFS, the streams of a regular file are actually attributes of `$DATA` type; the primary stream is an unnamed `$DATA` type attribute, and any alternate data stream (ADS) is a named `$DATA` type attribute. By contrast, extended attributes are not stored in `$DATA` type attributes, they are stored in the file's `$EA` and `$EA_INFORMATION` attributes. I believe `$EA` contains the actual extended attribute data, whereas `$EA_INFORMATION` is an index to speed up access.
Alternate data streams are accessed using ordinary file APIs, suffixing the file name with `:` then the stream name. Actually, in its fullest form, an NTFS file or directory name includes the attribute type, so the primary stream of a file `foo.txt` is called `foo.txt::$DATA` and an ADS named bar's full name is `foo.txt:bar:$DATA`. For a directory, the default stream is called `$I30` and its type is `$INDEX_ALLOCATION`, so the full name of `C:\Users` is actually `C:\Users:$I30:$INDEX_ALLOCATION`. You will note in `CMD.EXE`, `dir C:\Users:$I30:$INDEX_ALLOCATION` actually works, and returns identical results to `C:\Users`, while other suffixes (e.g. `:$I31` or `:$I30:$DATA`) give you an error instead. Windows will let you created named `:$DATA` streams on a directory, but not a named one.
By contrast, extended attributes are accessed using dedicated Windows NT APIs, namely `NtQueryEaFile` and `NtSetEaFile`.
I'm not sure why Windows POSIX went with EAs instead of ADS; I speculate it is because if you only have a small quantity of data to store, but want to store it on a huge number of files and directories, EAs end up being faster and using less storage than ADS do.
EaData & EaFile remind me of the murky memories of OS/2 APIs.
HPFS had a different approach of internally handling EAs, but OS/2 did create extra file on FAT16 filesystems to store EAs, which could point to origin of $EA. (HPFS itself has special EA-handling implemented in its FNODE, equivalent of inode/FILE entry)
I do not recall the EA actually being used anywhere by new code though, quite shocked by the mention of WSL. Old POSIX subsystem originated before ADSes I think, and might have decided to avoid creating more data types.
My quip about difference of Linux/Irix xattr is related to architectural design involved in the APIs - Irix style xattr API (copied by Linux) is rather explicitly designed for short attributes - do not know if it's still current but I recall something about API itself limiting it to single page per attribute? Come to think of it, that would match certain aspects of Direct IO that AFAIK were also imported from Irix...
Oh, and BTW - NTFS internal structures being accessible as "normal" files is one of the design decisions inherited from Files-11 on VMS, one I quite like from architecture cleanliness pov at the very least.
uid, gid, mode, and POSIX format timestamps are stored in an EA. It also mentions file capabilities being stored in an ADS. On Linux, capabilities and ACLs are stored in xattrs, so that seems to imply that xattrs are stored in ADS not EA.
> Old POSIX subsystem originated before ADSes I think, and might have decided to avoid creating more data types.
I'm not sure about that, I think support for ADS has been in NTFS from its very beginnings, it was designed to support it from the very start.
Actually, from what I understand, the original design for NTFS – which was never actually implemented, at least not in any version that ever shipped to customers – was to let users define their own attribute types. The reason why their names all start with $, is that was supposed to reserve the attribute type as "system", user attribute types were supposed to start with other characters (likely alphabetic). And that's the reason why they are defined in a file on the filesystem, $AttrDef, and why the records in that file contain some (very basic) metadata on validating them (minimum/maximum sizes, etc). If they were never planning to support user-defined attribute types, they wouldn't have needed $AttrDef, they could have just hardcoded it all in the code.
I used to dual boot OS X and Windows on my Mac in the late 2000s. I am pretty certain when I open the HFS+ volume and copy things to the NTFS volume, some stuff became alternate data streams. Windows even had a UI to tell me about it. I didn't understand it then but my guess would be that's the resource fork.
OS/2's HPFS also had alternate data streams, called Extended Attributes. You'd make two calls to DosQueryFileInfo() - the first time to get the size of any EAs so you could allocate a buffer, then call it again to read the contents into the buffer.
It got used occasionally - not a lot. I had a newsgroup reader that would store the date of the last time you downloaded items for a group in an EA (of the file that had the items).
Rarely used because it's invisible and quite awkward to use as a user, basically unusable to most, with no GUI. Also because it will just silently be demolished if you copy to/from an FAT filesystem like a typical flash drive, so it's completely unreliable.
I work on ReFS and a little bit on NTFS. Alternate data streams are simply seekable bags of bytes, just like the traditional main data file stream. Security descriptors, extended attributes, reparse points and other file metadata are represented as a more general concept called an "attribute".
You can't actually open a security descriptor attribute and modify select bytes of it to create an invalid security descriptor, as you would if it were a general purpose stream.
Help me understand the terminology. I thought alternative data streams were just non-resident attributes. Attributes like "$SECURITY_DESCRIPTOR" have reserved names but, conceptually, I thought were stored in the same manner as an alternative data stream. (Admittedly, I've never seen the real NTFS source code-- I've only perused open source tools and re-implementations.)
Essentially, attribute names directly specify the attribute type - so $SECURITY_DESCRIPTOR declared the entry in FILE attribute list to be a security descriptor. DATA attributes have another name field to handle multiple instances
> Essentially, attribute names directly specify the attribute type - so $SECURITY_DESCRIPTOR declared the entry in FILE attribute list to be a security descriptor. DATA attributes have another name field to handle multiple instances
If you at the Linux kernel source code, `fs/ntfs3/ntfs.h` contains the following:
struct ATTRIB {
enum ATTR_TYPE type; // 0x00: The type of this attribute.
__le32 size; // 0x04: The size of this attribute.
u8 non_res; // 0x08: Is this attribute non-resident?
u8 name_len; // 0x09: This attribute name length.
__le16 name_off; // 0x0A: Offset to the attribute name.
__le16 flags; // 0x0C: See ATTR_FLAG_XXX.
__le16 id; // 0x0E: Unique id (per record).
union {
struct ATTR_RESIDENT res; // 0x10
struct ATTR_NONRESIDENT nres; // 0x10
};
};
So the name field isn't specific to `$DATA` attributes, every attribute has it. However, for most attributes either the name is zero bytes, or it is a hardcoded name (like `$I30` for directories). Is `$DATA` the only one that can have different instances of the attribute with arbitrary names?
Arguably from the point of On Disk Structure (to reuse terminology from NTFS' ancestor in VMS), all attributes can have names as well.
Now, implementation in ntfs.sys is another thing and I have no idea if it's just an unused code path or if something would explode, and from what I heard Microsoft ended up in situation where people are scared to touch it not because of code quality but because of being scared of breaking something.
the 'trusted flag' (my term) == the thing that you touch when you Unblock-File (pwsh) or uncheck in the file properties UI => lives in an alternate data stream.