Write-once filesystem ~~~~~~~~~~~~~~~~~~~~~~~~~~~ This filesystem is designed to be very small in various ways: - small code size - small RAM usage - small storage overhead In order to achive the specified goal, some filesystem features are missing and some of them will be slow. Aspects that influenced the design: - SPI NOR flash chips (partial page programming, where 1s can be turned to 0s) - MCUs with on-chip writable flash storage Chapter 1: Terms ------------------ Block size: The allocation block size, that can be individually erased. Its size must be power-of-2, and cannot be smaller than the inode size. Maximal block size is 65536 (or 2^16) if WRONCEFS_16BIT_MODE=0 and 32768 (or 2^15) if WRONCEFS_16BIT_MODE=1. Minimal block size is 64 bytes (2^6). Erase: Set all bytes of the specified area to 0xFF. On SPI flash this is a separate operation, other implementations can simply emulate it. Inode: the description structure at the beginning of each block. Holds info about the content of the block. Inodes (and therefore blocks) are indexed starting 1, there is no inode 0 (nor block 0). The filesystem can support up to 65534 inodes. Chapter 2: Configuration -------------------------- wroncefs_op_read Macro to a function that can read from the underlying storage at any offset and any size. Here, "any size" means 1 inode or 1 complete directory entry, and any buffer the user passed. Must return zero on success; non-zero otherwise. Signature: int8_t op_read(void* buf, wofs_addr_t addr, wofs_size_t size); Required only if WRONCEFS_DIRECT_READ=0. wroncefs_op_write Macro to a function that can write to the underlying storage at any offset and any size. Here, "any size" means 1 inode or 1 complete directory entry, and any buffer the user passed. Must return zero on success; non-zero otherwise. If this function fails once, then all subsequent write/erase calls must fail as well. The filesystem relies on this requirement, and can maintain consistency only if this is guaranteed. Signature: int8_t op_write(const void* buf, wofs_addr_t addr, wofs_size_t size); wroncefs_op_erase Macro to a function that can erase the underlying storage at any offset (if offset is multiple of block size) and any size (if size is multiple of block size). Must return zero on success; non-zero otherwise. If this function fails once, then all subsequent write/erase calls must fail as well. The filesystem relies on this requirement, and can maintain consistency only if this is guaranteed. Signature: int8_t op_erase(wofs_addr_t addr, wofs_size_t size); WRONCEFS_16BIT_MODE If set, use 16-bit integers; otherwise use 32-bit integers. Using 16-bit integers, file size is limited to 32767 bytes, while 32-bit file offset allows file sizes up to 2 GiB minus 1 byte. Furthermore, sets wofs_addr_t and wofs_size_t to 16-bit integer if set. WRONCEFS_BASE_ADDRESS The base memory address for read access if WRONCEFS_DIRECT_READ=1. If set, must be an expression that evaluates to a pointer (should be a constant number). Must be aligned to 4 bytes, but it is recommended to be aligned to block size. WRONCEFS_DIRECT_READ Set to 1 if the underlying storage can be directly read. This is commonly the case for MCU on-chip flash. Set to 0 otherwise. WRONCEFS_CAN_ALTER Set to 1 if the underlying storage can be changed at any time, without the "1s to 0s" restriction. SPI flash has the "1s to 0s" restriction, for those set this option to 0. WRONCEFS_NAME_MAX The maximal length of a directory entry name the implementation can handle. This value limits the mountable filesystems. Must be at least 8, and cannot exceed 254. *Implementation-level configuration* WOFS_EXTERN This macro is placed before all function forward declarations. Default is 'extern'. WOFS_IMPL This macro is placed before all function definitions. Default is empty. WRONCEFS_SHORT_ERROR_STRINGS Default setting is zero, in this case wofs_strerror() will return human readable messages. If set to nonzero, very short messages will be returned. WRONCEFS_USER_STATE Setting this macro will enable "user state" functionality. This means, the user must pass the state structure to all WOFS functions, providing some sort of reusable code. Chapter 3: Layout ------------------- The storage is divided into N blocks, each block holds an inode at the beginning, and for file nodes, a length indicator at the end of the block. An inode can have a successor, which provides extendable content (for files and for directories as well). Inode allocation can be performed in any order, but it must be empty (which means, the signature field is 0xFFFF). The structure of the inode is described in a dedicated chapter. For Type=file inodes, the structure of the length indicator follows: 1) If WRONCEFS_CAN_ALTER=1: the last two 16-bit number hold the actual number of written bytes into that block, except if that number is 0xFFFF, in this case the actual number of written bytes is zero. The length indicator is valid if both 16-bit number holds the same value. If they differ, if will be restored to the smallest possible value that covers all non-0xFF bytes in the file data area. The length indicator must be filled separately: write the length value in the indicator slot #2 (see below), and if that succeeds, write the same length value to slot #1. Newly allocated, empty block: +------------------------+ | Inode structure | +------------------------+ | Data start offset | +------------------------+ | Prev inode | +------------------------+ | Empty block space | +------------------------+ | 0xFFFF marker #1 | +------------------------+ | 0xFFFF marker #2 | +------------------------+ Partially filled block: +------------------------+ | Inode structure | +------------------------+ | Data start offset | +------------------------+ | Prev inode | +------------------------+ | File data | +------------------------+ | Empty block space | +------------------------+ | Length indicator #1 | +------------------------+ | Length indicator #2 | +------------------------+ Full block: +------------------------+ | Inode structure | +------------------------+ | Data start offset | +------------------------+ | Prev inode | +------------------------+ | File data | +------------------------+ | Length indicator #1 | +------------------------+ | Length indicator #2 | +------------------------+ 2) If WRONCEFS_CAN_ALTER=0: a) If the last two 16-bit number in the block is 0xFFFF, then zero bytes are written. b) If the last two 16-bit number in the block is not 0xFFFF, then find backwards the last non-0xFFFF number, and that number will indicate the number of written bytes -- note, a full block need to contain an 0xFFFF marker in order to find the valid length indicator. The length indicator is valid if the auxiliary byte is 0x18 (other values invalidates the length indicator). The compound length indicator must use at least 4 bytes, making preserved space for the file chaining fixup. The length indicator should be written in a strict order: first, write the 16-bit length value, and if that succeeds, write the 0x18 marker. Newly allocated, empty block: +------------------------+ | Inode structure | +------------------------+ | Data start offset | +------------------------+ | Prev inode | +------------------------+ | Empty block space | +------------------------+ | 0xFFFF marker | +------------------------+ Partially filled block: +------------------------+ | Inode structure | +------------------------+ | Data start offset | +------------------------+ | Prev inode | +------------------------+ | File data | +------------------------+ | Empty block space | +------------------------+ | 0xFFFF marker | +------------------------+ - | 0x18 marker | \ +------------------------+ } Length indicator | Data length | / +------------------------+ - | Old length indicators | +------------------------+ Full block: +------------------------+ | Inode structure | +------------------------+ | Data start offset | +------------------------+ | Prev inode | +------------------------+ | File data | +------------------------+ | 0xFFFF marker | +------------------------+ - | 0x18 marker | \ +------------------------+ } Length indicator | Data length | / +------------------------+ - | Old length indicators | +------------------------+ Data start offset: 16-bit or 32-bit (depends on WRONCEFS_16BIT_MODE) integer specifies the offset in the file of the first data byte in that block. Therefore, the offset in the next block is equal to offset of the current block plus the number of bytes the current block contains. Prev inode: the previous block in the chain Directory stucture is described in a dedicated chapter. Chapter 4: Inode structure ---------------------------- 1 1 0 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Filesystem signature | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | First inode | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Next inode | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Next valid | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Alive | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Filesystem signature: the "W\xf5" sequence for little-endian versions, and the "\xf5W" for big-endian versions First inode: the first inode of the inode chain Next inode: the next inode in the inode chain, used to extend content, 0xFFFF terminates the inode chain Next valid: the data in "Next inode" is valid (0xFF means "Next inode" is not used, and "Next inode" must be 0xFFFF; 0x00 means "Next inode" is in use and valid; other values considered garbage, and the inode should be purged) Reserved: reserved for future use, must be 0xFF Type: inode type, one of: file (0x1E), dir (0xD1), temp-dir (0x7D), temp-file (0x7E) Alive: indicates whether the inode is alive (0xFF) or dead (not 0xFF) -- dead inodes can be purged during next mount, but see "Mount" chapter Type=temp-dir means the inode stores content of a directory inode. This inode should exist only during a directory entry removal, and will be removed upon completion. If Mount operation finds this inode, the interrupted operation must be completed. See also: "Directory structure" chapter. An inode is "almost deleted" if Alive=0xFF AND "Next inode" is a) a valid inode reference, and that inode is deleted; OR b) 0x0000. This inode must be purged during "Mount". Chapter 5: Filesystem info ---------------------------- The essential filesystem information is stored in the root directory as a special directory entry. The "Inode" field in the directory entry must be 0, "Name length" must be 8 (size of the structure), and "Alive" must be 0x55. The structure of this info follows: 1 1 0 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Block size sh | Max name len | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Total number of blocks | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flags | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Block size sh: log2 of block size; the actual block size is 1<<"Block size sh" Max name len: maximal length of a directory entry name (without the terminating zero) Total number of blocks: total number of blocks (or inodes) the filesystem has Flags: 0x1: "can alter", stores the value of WRONCEFS_CAN_ALTER; since this capability is determined at compilation time, and the implementation can write the version it was compiled for, the filesystem will be read-only if this bit differs 0x2: "16-bit mode", store the value of WRONCEFS_16BIT_MODE; similarly to the "can alter" bit, if this bit differs, then the filesystem can be mounted only in read-only mode other flags are reserved, and must be zero Reserved: reserved for future use, must be 0xFF Chapter 6: Directory structure -------------------------------- The root directory is inode 1. Every directory has two mandatory and unlinkable entries: "." and "..", which are the current and the parent directory. The inode of the current directory is the same as the directory's inode; parent of the inode of the directory where the current directory's entry is located. The parent of the root directory is the root directory. Directory content is stored in the block where the inode is located. The usable space for directory content is the rest of the block, there is no length indicator like file content has. Entries are written into it sequentially. Empty space starts with Inode=0xFFFF, and last until the end of the current block. Each directory entry must fit in the same block where it started (it cannot continue in the subsequent block defined by inode chaining). Format of the directory entry follows: 1 1 0 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Inode | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Name Length | Alive | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Name | . . | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | padding for 2 bytes alignment | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Inode: the inode the directory entry is referring to Name Length: length of the entry name without the terminating zero Alive: the entry is alive if this holds 0xFF, for filesystem info the value is 0x55, and dead (or unlinked) entries are 0x00 -- other values considered as broken, and the filesystem shall be formatted again Name: the name of the entry, without the terminating zero; maximal name length is limited by WRONCEFS_NAME_MAX and "Max name len" value from the filesystem info Note that, every directory entry must be aligned to 2 bytes. When allocating a new directory entry, first, calculate the required space, then determine whether it fits in the current block. If the required space is larger than the remaining space in the directory data, then go to the next block of the directory (if any). If the new entry does not fit in any directory blocks, then allocate a new one. The new directory entry must be written in the following order: 'Name', 'Inode' and 'Length'. Filling 'Length' last will provide a good indicator to the cleanup mechanisms they found a broken entry, since size of this field is 1 byte (and we can assume that writing 1 byte is atomic). To remove a directory entry, first mark the entry dead: set Alive to 0x00. To get rid of the dead entry, it is needed to allocate a temporary inode. Fill the inode structure except the Type field. Copy all non-dead entries into the new inode. Since a directory entry will be removed, there will be space for the structure below. This structure will guide the Mount operation in case of entry removal interrupted (i.e. by a power supply failure). This structure is needed to be filled after all non-dead entries are copied: set "Original inode" accordingly, and set "Content size" to the appropriate size. Next is to finally set the inode type to "temp-dir" of the new inode. Then erase the original inode, copy the alive entries then set the inode structure, finally clear the temporary inode. Remark #1: The first step is to mark the entry dead. This keeps integrity of the filesystem later if the next operation is not performed (i.e. reset), because the entry might be not removed but it is still marked as garbage, and the next directory entry removal will deal with this. Remark #2: If the forward chaining is broken, clear the forward chain info. This could generate a lost block, but the subsequent mount could reclaim it. The structure that is located at the end of the temporary directory inode: 1 1 0 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Original inode | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Content size | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Original inode: the inode the content should be moved into Content size: the size of the content to be copied back A directory is considered as empty if it does not contain dead entries, and the number of non-dead entries is 2 (these are "." and ".."). Note that, the root directory has the filesystem info in it, which increases this counter to 3. Therefore, the root directory cannot be removed. Removing an empty directory: 1. Mark the directory entry as dead. Rather generate lost inodes than invalid data, which prevents mount to succeed. 2. If directory has no successor inode ("Next inode" is 0xFFFF), then set "Next inode" to 0x0000. Otherwise: mark all inodes belong to the directory as dead except the first one -- making the directory "almost deleted". 3. Remove the directory entry normally. 4. Mark directory inode as dead. 5. Clear all inodes belong to the directory in any order. How to deal with damaged chaining: The general directory entry removal mechanism should deal with this. Although, no directory entries will be removed, but the whole block will be rewritten, fixing the damaged chaining. Therefore, executing the cleanup phase of the directory entry removal is sufficient. Chapter 7: File operations ---------------------------- Creating a file: new files can be created in any directory. A new file must have a unique name in the selected directory. The operation will return a file descriptor (the file descriptor structure must be provided by the user). Opening a file: files can be opened by their name. The operation will return a file descriptor (the file descriptor structure must be provided by the user). WARNING: The implementation might not synchronize opened file descriptors! When opening a file twice, and if data is appended via one of the file descriptors, then the other file descriptor may not see the appended data. Appending to a file: append will work until there is enough space on the filesystem. No additional restrictions apply. Overwriting file content: file data can be preallocated with fallocate, and the content will be full 0xFF (for newly allocated bytes). Since this option exists, and data can be written sequentially or randomly as well, the implementation will not track which byte is written and which is not. Therefore, overwriting already written data will work only if WRONCEFS_CAN_ALTER=1, otherwise the content is unpredictable. Unlinking a file: 1. Mark the directory entry as dead. Rather generate lost inodes than invalid data, which prevents mount to succeed. 2. If file has no successor inode ("Next inode" is 0xFFFF), then set "Next inode" to 0x0000. Otherwise: mark all inodes belong to the file as dead except the first one -- making the file "almost deleted". 3. Remove the directory entry normally. 4. Mark file inode as dead. 5. Clear all inodes belong to the file in any order. How to deal with damaged chaining: 1. Allocate a temporary file inode. 2. Fill the inode signature, making the inode garbage to subsequent mount. 3. Copy file info (start offset, prev inode) 4. Copy file content 5. Write the last two 16-bit values as follows (note: this is independent from the value of WRONCEFS_CAN_ALTER): The 16-bit number on the higher address shall contain the data length; the 16-bit number on the lower address shall contain the inode the temporary inode is made from. 6. Finish the temporary file inode, to make it permanent. 7. Erase the original inode. 8. Copy the file info, and the file content back. (Note: no need to be cautious here; if this or the next operation fails, mount will erase the whole block again, and retry it.) 9. Fill the inode structure. 10. Erase the temporary file inode. The completed temporary file inode will look like this: +------------------------+ | Inode structure | +------------------------+ | Data start offset | +------------------------+ | Prev inode | +------------------------+ | File data | +------------------------+ | Optional empty space | +------------------------+ | Original inode | +------------------------+ | Data length | +------------------------+ Chapter 8: Mount operation ---------------------------- Mounting this filesystem can take some time if an operation was interrupted, but completing it is mandatory. Other operations are optional, however it is advised to perform them. The filesystem info is located in a special directory entry in the root directory. However, if the root inode is not consistent, then find the temporary inode. If the filesystem info cannot be recovered, then mount fails. In this case, the caller can format the filesystem again. The filesystem is "clear" if all inodes are consistent and no temporary inode is found. In this case, mount is quick. Otherwise: 1. If a temporary directory inode was found: finish interrupted directory entry removal based on the state of the temporary inode. If "Original inode" is 0xFFFF, or "Completion mark" is not 0xFFFF, remove the inode. Otherwise, clear the referenced inode, copy the directory entries, set the "Completion mark" to 0x0000, and remove the inode. 2. If a temporary file inode was found: finish the interrupted file chaining fixup. Erase the referenced block, and rewrite the prepared value into it, and erase the temporary file inode. This is a straightforward fix, everything is already prepared. 3. If an "almost deleted" inode was found: if the inode has chained inodes then ensure all of the successors are marked as dead. Then scan all directory inodes (in this case inode chaining is omitted), and complete the deletion by removing the directory entry that points to the "almost deleted" inode. 4. Clear all remaining dead inodes.