Hacker News new | ask | show | jobs
by dataflow 706 days ago
> i've had a bootloader for a small 64-bit kernel based on this that fit comfortably into the bootsector, including loading the kernel from disk and setting up vesa modes, no stage2 required.

How in the world do you fit all that in 512 bytes? I'm guessing you don't have a real-world filesystem (that allows the kernel to be anywhere on the disk just as a normal file)? Because just dealing with file fragmentation should bump you way over 512 bytes I would imagine.

4 comments

> Because just dealing with file fragmentation should bump you way over 512 bytes I would imagine.

Historically, many filesystems have had a special file type or attribute for contiguous files - a file guaranteed to not be fragmented on disk. Most commonly used for boot loaders, OS kernels, or other essential system files - although historically some people used them for database files for a performance boost (which is likely rather marginal with contemporary systems, but decades ago could be much more significant).

Some systems required certain system files to be contiguous without having any special file metadata to mark them as such - for example, MS-DOS required IO.SYS and MSDOS.SYS to be contiguous on disk, but didn’t have any file attribute to mark them as contiguous. Unlike an operating system with proper support for contiguous files, DOS won’t do anything to stop you fragmenting IO.SYS or MSDOS.SYS, it is just the system will fail to start if you do. (Some might interpret the System attribute as implying Contiguous, but officially speaking it doesn’t.)

I've been thinking about how to structure a filesystem such that code complexity - in the boot loader and elsewhere - can be minimized. You'd want something similar to NTFS, except that it should be possible to refer to a file (MFT entry) directly by starting sector number. So they would either need to always remain at a fixed position, or have a link pointing back to any reference so it can be updated.

Somewhere in the first sector (aligned to a 64 bit boundary), there would be a small structure that just contains a unique signature (not ASCII but some random value; 64 bits seems like more than enough as opposed to a GUID), and a pointer to the "FS info block". Leaving all remaining space in the sector available for boot code or possibly another "overlayed" filesystem.

That info block in turn would point to the MFT entry for the stage 2 boot code. An MFT entry would contain at a minimum an easy to locate list of (start sector,count) extents. Maybe there could be a flag that tells the OS that a certain file should never be fragmented?

File names would be entirely optional and for human consumption; for direct links within a filesystem, sector numbers would be used, and some kind of unique numeric identifier for anything "higher-level".

I'm genuinely wondering if some expert here sees any problem with this scheme, other than it not conforming to how mainstream OSes do things?

> File names would be entirely optional and for human consumption; for direct links within a filesystem, sector numbers would be used, and some kind of unique numeric identifier for anything "higher-level".

Each file has index number ("Inode"?) in the MFT. The first 24 (0-23) are reserved, and 11 of them contain metadata about the volume . https://flatcap.github.io/linux-ntfs/ntfs/files/ - somewhere in the Windows API is something that allows opening a file by "Inode" number. This link and info may be really old so it could be more of the reserved inodes are used now.

So, if 23 isn't being used yet, you could use that to put your 2BL - create a file there and name it $2BL or something. Would be funny to see what future Windows update that does use it does to it, if that ever happens (and of course maybe it is used).

> Maybe there could be a flag that tells the OS that a certain file should never be fragmented?

Haven't looked but I recall from an old book I read that small files are stored right in the MFT and I think existing data + the cluster size is the limit there.

the first somewhere in the first sector - this makes me think of the superblock concept. u scan the start of the disk for it (ext2 or 4? or both?). Not the first sector because that's for MBR purposes, but after that...

not fragmenting files.... well. U don't fragment files until you do.. u know. If u write an FS it will only fragment what u tell it to. So if u want all contigious files then u can simply do it that way. store them as (file_id,sector_count,start). or even more simple (fname_len,fname,sector_count,start_sector) so u dont need to keep nameblock around for filenames (fat?)

If u look at Ext2/4, FAT16/32 and others u will see that it all started quite basically, keepin a list of files and offsets. But due to things like disk reliability, user errors, system stability issues etc., you need a lot of extra stuff.

Also, questions like : how long is a maximum for a filename length? This kind of stuff can really impact what kind of features u need to implement.

How long is a file allowed to be? How many files are maximum for the FS?

This might sound silly, but there's already datacenters out there (a lot actually) who cannot use most filesystems because either files are too huge, partitions are too huge, too many files are present for index structures etc.

If you want dead simple for a starting OS: (sector_count,start_sector,fname_len,fname,0)

If you want more, try looking at ext2 or FAT32 or if your system specifications require it, look even beyond. (ext4, ZFS, NTFS, etc.) - A lot of these are subtely different, trying to solve different problems, or similar problems in different ways.

yes, the kernel was in a known location on disk (directly after the bootsector).

the whole boot disk image was generated during build, which is common for small systems.

you can just stick the kernel at sector 2 and read it from there using extended bios disk read. specify your DAP to read the kernels amount of sectors starting from sector 2, load it to something like 0x7e00 or some reachable place. It will have still limits on how much it can read, per read and in total.

If you do this all within 1 sector, equally you do not do any error checking. just ram stuff into memory and yolojump into it.

the basic would be: load kernel from disk using bios interrupt get memory map using bios interrupt parse kernel ELF header / program headers and relocate it - elf header to find ph_off and ph_num and entry_point - program headers to find all pt_load and rep movb them into tgt phys addr.

Also with 510 bytes generally u will not make nice 'page tables' though this is actually possible with only few bytes. - i did not manage it yet in 510 :D but i am sure there's someone who can do it... it can be done really efficiently.

the disk would be formed by assemblding the mbr. then doing something like

cat mbr.bin kernel.bin > disk.bin (maybe here use truncate to padd the disk.bin to a certain size like 10+MB or so - helps some controllers recognize it better)

all that said it's not useful to do this. you will find it an interesting excersize at best. like trying to make 'tiny ELF' file or so. fun to learn, useless in practice.