Charting a Path for Julia
A unified filesystem model for Julia, built on paths, handles, and an abstract filesystem interface
Comments-from: Chengyu Han, Jānis Erdmanis , Kevin Bonham, Neven Sajko, Miles Cranmer, [your name here]
1. History
Julia's approach to file paths was largely inspired by Python …just before Pathlib was adopted. In the years since, the idea that a path type would benefit Julia has been articulated multiple times, in different ways.
- In 2013 the path methods we know today were introduced.
- In 2014, @stevengj made an issue proposing a mildly cursed partial workaround for the lack of a path type.
- In 2016, FilePathsBase.jl was started.
- In 2017 (just before Julia 1.0), Frames wrote a Julep advocating for a path type, but unfortunately it didn't go anywhere before Julia 1.0 was out.
- In 2018 this was incidentally mentioned in a discourse topic.
- In 2020, a newcomer from CommonLisp opened a discourse topic about missing a path type
- In 2020, an issue was opened in the main Julia repo on this.
- In 2021 Jakob wrote a post that stuck in my mind examining flaws in Julia, including the lack of a path type in the language.
- In 2022, ExpandingMan starts working on FilePaths2.jl.
- In 2024, I got fed up enough with this after the latest paper cut I experienced to write a gripe on Slack.
Julia's Slack only keeps 90 days of conversation history, but you can usually search for "path type" and find somebody running into paper cuts/headaches. Ignoring my recent gripe, doing this I see Mosé in response to some platform-specific handling that came up on a Julia PR addressing the difference a trailing slash makes with some depot operations.
We should really have a proper path type, strings are simple bad for manipulating them 💯 ×2
The use of strings as paths also precludes (or at least complicates) support for filesystems beyond that of the current operating system. Virtual filesystems are increasingly recognised as a useful abstraction in programming languages, not just operating systems (see: io/fs in Go for example), but very few languages provide built-in support (Java is a notable exception with NIO2).
2. Motivation
While C and friends use char-vector types for strings, paths, and more, most modern high-level languages have settled on a dedicated path type, and for good reason. While familiar, this approach conflates several distinct concepts: textual data, namespace locations, and filesystem resources. This places a significant burden on users and library authors to reason about correctness, safety, and platform-specific behaviour.
Users with experience with a modern language (such as Rust, Python, Java/Kotlin/Scala, Swift, or C++17) will be familiar with the value of a dedicated path type, and the importance of it being part of the base language/stdlib. Julia already has dedicated non-String types for regular expressions, substitution strings, and more. Filesystem paths merit the same treatment.
A first-class path type offers several concrete benefits:
- Resolving the ambiguity of representing data as a string directly vs. a path to the data
- Allowing for dispatch on text content/paths, and more generic functions
- A platform-independent syntax and methods for working with paths
- Less footguns/paper cuts, from reduced ambiguity and more rigorous handling
- Support for virtual filesystems
This path type must exist in Julia's base, for two primary reasons:
- Base itself makes extensive use of paths
- A 3rd party library cannot provide the same level of ecosystem-wide coherence and consistency
Experience from other languages also reveals that paths should be considered together with the context they operate within. The filesystem uses paths as means to resolve a reference to a resource, and paying careful attention to what this means allows us to generalise filesystem interaction to support capability-based access and virtual file systems.
3. Terminology
While the terms used to describe files, paths, and other filesystem-related entities and operations are standard, when splitting hairs (as this proposal does), it is important to be precise with our language. To that end, here are the specific concepts we will be interrogating:
- A path is a description of a location within a namespace. It is a structured value composed of segments, and its meaning is defined only relative to a particular filesystem and (in general) a point of reference within that filesystem. A path does not identify a resource directly, nor does it convey authority to access one.
- A handle is an authoritative reference to a specific resource. Handles are typically produced by resolving a path, and they provide the ability to interact with the referenced resource. Unlike paths, handles do not describe where a resource is located in the namespace; they refer directly to the resource itself. Handles are temporal in nature: they may become invalid over time (for example, when closed).
- A filesystem is an organisation of resources together with rules for naming, resolving, and accessing those resources. Conceptually, a filesystem defines a namespace in which paths may be interpreted and resolved. Resolution establishes a relationship between paths and resources, but this relationship is not assumed to be stable across time or operations.
- Resolution is the act of turning a path into a handle within the context of a filesystem. Resolution is an effectual operation: it may fail, and its result may depend on the state of the filesystem at the time it is performed. Making resolution explicit is central to distinguishing between operations on names and operations on resources.
- A capability is the authority to perform a particular class of operations on a resource or within a filesystem. In this proposal, capabilities are represented explicitly by values (usually handles) and are not assumed implicitly through ambient state or global context. Treating filesystem access as capability-based allows code to be written that is safer, more composable, and more amenable to restriction or sandboxing.
4. Design complications
4.1. Locations and resources are distinct
In conversation, I've received a fair bit of pushback from multiple individuals on the normalisation I've proposed in the prior section. The essential argument is that the nice, algebraic model of paths isn't able to fully abstract over the world of system-specific details. To give a few examples:
- There's the symlink stuff from earlier
- stat foo/bar is different to stat foo/bar/.
- Some tools like rsync treat foo differently to foo/
- /.. is / (root is a fixed point essentially on Linux)
I really dislike these complications, particularly because in accepting this messiness we abdicate the handling of it to the user, and thus make it easier to write naïevely buggy code.
Looking at this another way, even more pushback along these lines is deserved. There is an abundance of unquestioned issues of this nature with the current status quo. For example: it is currently not possible to write to a file and then move it without there being an opportunity for the file to be replaced entirely in between each step.
This kind of issue and much of the pushback I've received essentially stems from one core issue: we often like to think of paths as unique resource descriptors, when in fact they are unique location descriptors. This is a subtle but important distinction. It is responsible for a large class of bugs and vulnerabilities known as TOCTTOU (Time of Check to Time of Use). Essentially any time a path is reused with any degree of outside influence (over the path or filesystem), it is near trivial to swap out the file in between operations by constructing a deep directory nesting and monitoring the directory atimes (yes, really). The only system where we can truthfully say this is not an issue is one with no concurrency of any sort (including time-sharing).
This is in large part a consequence of the initial POSIX standard being path
oriented, a limitation that is gradually being rectified with the addition of
f<op> and <op>at calls, such as faccessat. These calls operate not on a path to
the file, but a handle to the file itself: a file descriptor, or FD for short.
File descriptors essentially sit in between a description of the location of a
resource, and the data on disk. I am less familiar with the NT situation, but am
lead to believe that it has been ahead of nix in supporting handle-based path
operations.
Other programming languages have also recognised this issue, for example Python's Pathlib separates paths into pure and concrete paths, creating a clear split between an abstract conception of a path (as I've found myself attracted to thus far), and something that actually exists on the filesystem. I suspect we can go even further, and consider a scheme by which we provoke the user into obtaining a reference to the resource at a path when they want to work on it, and so avoid TOCTTOU-style issues and related messiness wholesale.
I conjecture that with the development of the file descriptor based API in POSIX 2008, Linux 2.6, and OpenBSD we have the capability to fulfill this ideal by building a path-like type that is oriented around file descriptors rather than path strings. We can make a more concrete path type than Pathlib's "concrete" paths. Arguably this isn't really a path any more so much as a handle to a filesystem-addressed resource. I'm not sure what best to call this, but regardless it seems exceptionally useful for writing safe filesystem-interacting code.
4.2. Filesystems are mutable and concurrent
To write
4.3. Symlinks and the pseduopath component
It has come to my attention that due to symlinks the first question cannot be answered in the context of a real/concrete path without querying the filesystem. On-disk, foo/bar/.. != foo with symlinks.
This has been the cause of much consternation, and extensive discussion on Slack. Various ideas were discussed on how to best handle this complication, including:
- Calling
realpathin the background - Returning
nothingor throwing an error whenparentis called on a path ending in .. - Using type/field information to split types into pure/concrete types a la Pathlib, and handle them differently
While mired in the messy filesystem details, Julius made the excellent point
that given the filesystem is in a constant state of flux ---realpath can be
wrong the moment after it returns— and so it's overly presumptuous of us to
try to handle this using information in the path object itself (using the type
domain or runtime information).
So, we should be upfront about this and just say that if you want
operations on a path to take into account the filesystem state at the
time, you need to call realpath.
Should realpath be resolved post-Path? It would also be good to have a unnormalised string to real Path function (ideally the same function).
This has the following benefits:
- Simplified model for path operations
- Predictable normalisation
- Separation of concerns
4.4. Empty segments
If one reads on Linux Pathname Lookup, one may notice that while empty segments are generally invalid, with the appropriate flags it can be valid to interact with the empty path.
I cannot begin to imagine any legitimate use case for this, and so am inclined to pretend this edge case doesn't exist, particularly since there's no easy way to supply the necessary flag to Julia's (public) filesystem functions.
4.5. Virtual filesystems
Most language's bake in the assumption that you are working on the local filesystem. While there have been some forwards-thinking efforts that untethered the notion of paths from the local machine (notably Plan9), even among modern languages, support for virtual filesystems from the foundations of the language up is rare.
Meanwhile, in the Linux, BSD, and MacOS operating systems virtual filesystems are increasingly recognised as a valuable abstraction for container runtimes, sandboxes, mixing remote and local filesystems, and more.
When support for virtual filesystems is not provided by the language itself, but
instead by the ecosystem, even if there is a single de-facto package, the design
of the basic path type and filesystem API is key to the cohesiveness of the
final result. A class/trait based language can allow a custom type to be
accepted, but if it's required that all path-like objects be shoehorned into a
string-based representation (as Python's os.PathLike and Rust's AsRef<Path>
require) sharply limit what is possible. In particular string-based designs make
it very difficult to apply a capability model.
4.6. Preserving performance
4.6.1. Avoiding overhead when invoking Libuv path methods
Using a contiguous null terminated char array (whether a String, Memory{UInt8},
or Vector{UInt8}) for the internal representation of system paths makes it
possible to pass the path representation directly to Libuv with no overhead.
4.6.2. SubStrings as the type of path components
Since we've got good reason to want a single contiguous on-disk format,
operations that fetch components of the path will either have to allocate a new
object … or we can use a SubString. This is the approach FilePaths2.jl takes,
and I like it.
5. Design goals
5.1. High level path interface
While we want to end up with a Path type, it would be good to take a step back, consider what makes a "path" conceptually, and define an abstract type we can then specialise on.
I propose that in the abstract, a path is an ordered series of directions that takes you to a location.
From this, we can conceptualise a path as a list of direction segments, and arrive at a few fundamental operations:
root: the origin pointparent: the sequence of directions up to but excluding the most recent onelength: the number of directions in the pathiterate: give each direction of the pathbasename: the most recent segmentjoinpath: combine two sets of directionschildren: the immediate next paths one may take (maybe?)
A path that includes a root is considered absolute, and other paths are relative.
5.2. Make invalid paths unconstructable
There are some characters that may not appear in a path.
- Posix (Linux/BSD/Mac)
- the null byte (\0).
- Windows
- |, the null byte (\0), and ASCII codes \x01 ~ \x31.
There are also some restrictions on filenames:
- Posix
- / and the null byte (\0).
- Reserved file names: . and ..
- Windows
- This is a superset of the restrictions for Posix
- Reserved characters: <, >, :, ", \, /, |, ?, and *. - null byte (\0)
- ASCII codes \x01 ~ \x31.
- Reserved names: CON, PRN, AUX, NUL, CON1 … COM9, LPT1 … LPT9 regardless of extension
- Any filename ending with = = or .
We will also pretend that the empty path is never allowed (see: Compromises).
References:
- Unix/Win, Comparison of filename limitations - Wikipedia
- Win, Naming Conventions - Win32
- Win, doctaphred/ntfs-filenames.txt - Gist
- Win, C# GetInvalidFileNameChars(), GetInvalidPathChars() - dotnet/runtime
It's fairly easy to apply the Posix path restrictions when constructing paths, but Windows is a bit of a pain, making me think perhaps it's not worth the effort.
Since the Windows restrictions are a (large) superset of the Posix restrictions, one approach that I'd like to explore is validating the Posix requirements are met during path construction, and then maybe checking for forms Windows wouldn't like in literal path construction (with the p"" macro) and emitting a warning.
5.3. Enforced standard form
5.3.1. Avoid representational ambiguities
Thinking of a path as a representation of a location, the existence of the . and .. pseudopath components complicates path considerations:
- are foo/bar/.., foo/., and foo equal?
- is foo/bar/ the same as foo/bar?
- are foo\\bar and foo/bar equivalent on Windows?
These questions tend to fall under path normalisation, but by using a dedicated path type and path-specific operations I contend that we can decide on rules for a canonical representation of a path, and make it the only form that can be constructed. There is no need for path normalisation, as it is no longer possible to construct an abnormal path.
Frames also discusses the algebraic appeal of normalised paths in her blog post. It also feels like the more principled choice to me, and I suspect it makes it harder to fall into a few edge cases.
While I really want to pre-normalise .. components, it seems like I might have to give up on this front …despite the fact that the vast majority of the time this will be a nice simplification without any gotchas 😔
5.4. Safe path interpolation
When interpreting externally provided content as a path, the existence of the
pseudopath element introduces a risk of ending up in an unexpected directory.
The intent of code like joinpath(workdir, subdirname) can be
subverted (deliberately or accidentally) by providing a subdirname like
/some/other/dir, ../stuff, the empty string, or even a null byte. This is most
of the Path Traversal class of CVEs. When interpreting a string as a path
segment, we can validate that it is a "normal" path segment and raise an error
otherwise, preventing a surprising result from appearing with forms like
p"path/$var/$name.txt".
5.5. Convenient prefixes
5.5.1. The ~ home shorthand
The handling of ~ in paths in Julia tends to be trip up people used to shell expansion, but there's a very good reason why Julia doesn't go ahead an interpret "~/dir" as /home/$USER/dir but requires expanduser: ~ is a valid path segment.
Without knowing the intent with which the ~ was written (or generated and passed around) it is not possible to reasonably decide whether it should be interpreted as a "~" segment or a reference to the home directory.
With a path macro, this changes. We can differentiate between a ~ that has been put literally at the start of a path, and a ~ that's come from elsewhere. This makes the convenient ~-home interpretation viable, without re-introducing the current issues. As a tradeoff expressing an initial ~ segment becomes less convenient, but given the relative frequency of home vs. "~" forms, this seems like a worthwhile tradeoff.
5.5.2. Introducing @ project shorthand
I'm not 100% sold on this idea, but I'm interested and it seems worth exploring.
Within package and project code, it is common to see forms like
joinpath(@__DIR__, "..", "..", "assets", "file.txt").
There are two major issues with this:
- Poor clarity of intent: this is an attempt to express the a target location within a project, but the path is expressed relative to the current file (wherever it may be within the project) rather than the project itself. The fact the form is twisted as a result (with @__DIR__, "..", "..") only makes this less apparent.
- As a result of the poor clarity/expression, this form is slightly fragile: moving the source file around will break the reference, even if the target remains in the same location within the project.
Extending the "special literal prefix" handling to treat @ as a project-prefix
as ~ is a user-prefix can improve this situation. The choice of @ seems natural
given the existing use of @-prefixed special paths in DEPOT_PATH and --project
already.
Besides the two issues above, a @-prefix also provides an opportunity to improve
the status quo with regard to relocatability. Enough Julia packages use @__DIR__
in paths to make relocatability a general issue (motivating
RelocatableFolders.jl and julia/PR#55146). Implementing @ as a relocatable
project-relative path (determined at compile-time) creates a form that is both
more convenient and more robust, a "pit of success".
5.6. Uniform cross-platform support
5.6.1. Platform-specific path types
It seems likely useful to still be able to model paths of other platforms, and we can do this without compromising ergonomics fairly easily by defining <Platform>Path types and then having a Path for the current platform.
5.6.2. Cross-platform path construction
Posix exclusively uses the / delimiter, and Windows accepts \ (preferred) or /.
As such, we can reasonably settle on / as the in-Julia syntax for paths, and handle operating system dependent normalisation in the background. This makes it impossible to accidentally hardcode a particular platform's delimiters.
5.8. Capability support through handles
To write
5.9. First-class Virtual Filesystems
5.9.1. Explicit filesystem identity
To write
5.9.2. Layered filesystem capabilities
To write
5.10. Making the safe path the happy/easy path
From all the investigation we've done so far, we know that:
- An abstract path type allows for sensible and efficient path manipulation
- A concrete fd-based path type allows for some TOCTTOU-safe filesystem operations
- A specialised directory entry type allows for efficient
readdirusage, and for other TOCTTOU-safe filesystem operations
Currently, we just use String for all of these purposes. This is "simple" in the
sense that all of the inherent complexity is put off to the user of the API to
think about. By contrast, this trio of system path types requires a little more
upfront thinking, but this is paid for several times over in the reduction of
edge cases that package developers and end users may hit.
6. Proposal
6.1. Type hierarchy
Arriving at, then determining how to satisfy, the key criteria takes careful thought. I shall now describe the third major iteration of the design. I will attempt to lay out a logical progression of ideas, but it is worth noting that the current design has not emerging from a clean linear thought process, but rather from imbibing a wide array of example models (see 7), trying various approaches, to simultaneously sketch the shape of a solution and distil the soup of ideas until a model that fit the shape emerged.
6.1.1. Key criteria
We want to have:
- Entirely separate types for paths and handles, since neither fit under the other
- Dedicated per-OS path types
- The ability to define new path and handle types, which "just work"
- A convenient and consistent way to go from paths to handles
- A dedicated filesystem type/interface, with paths and handles that include the filesystem
- A go-to Path type for use in argument type annotations, that: a) Encompasses both paths and handles b) Supports both local and virtual filesystems c) While allowing for simple usage
- Provides the foundations for a capability-based access model
- Allows for arbitrary layering / mix 'n match with different kinds of filesystems
- Is type stable and efficient
6.1.2. Handles and paths
To start with, let us consider what the most critical detail of the separate
path and handle types required by 1 is. A handle is a wrapper around something
and a path can be resolved to a handle for that same something. It follows the
principal type parameter of an AbstractHandle is what the something is, and that
an AbstractPath should specify what kind of handle it may be resolved to.
julia
abstract type AbstractHandle end
abstract type AbstractPath{H <: AbstractHandle} end
handle(h::AbstractHandle) = h
handle(::AbstractPath{H}) :: H
handle interface.
This simple design has the potential to complicate calling and writing
filesystem code. We want to encourage packages to write handle-first code, since
it avoids more filesystem corner-cases (along with a few other benefits
described later), however often external programs and users will want to provide
a path. Asking package authors to write two versions of each function is
untenable, even with handle making it easy:
handle makes it easy, two methods are required to support paths and handles.Realistically, package authors will forget to implement at least one "flavour" and so the ecosystem will end up incompatibly split between path-based and handle-based APIs. We need a single unified way of supporting paths and handles. This is what criteria 6 is about.
We can create a convenient Path type alias that paths and handles of the same
nature sit under. We could have const PathOrHandle{H} = Union{<:AbstractPath{H}, H} where {H <: AbstractHandle}, but I think it is
worth considering whether there could be a sane supertype that AbstractPath and
AbstractHandle could both sit under.
Focusing on the key practical consideration of a path —what kind of handle it
may produce— leads us to what a simple but meaningful abstraction: a type that
can be resolved to a certain kind of handle. Let us call this AbstractResolvable
(instead of PathOrHandle). The interface is thus: an instance of
AbstractResolvable{T} must be able to produce an
AbstractHandle{T} when handle is called on it. To have AbstractHandle
be both a subtype as well as part of the AbstractResolvable type parameter like
this would be simpler if Julia supported a self-like keyword when defining type
relations, or supported F-bounded polymorphism. Alas, Julia does not, and so we
must be a little more verbose and repeat the handle type as a type parameter in
the AbstractHandle/AbstractPath subtype declaration.
julia
abstract type _AbstractResolvable{H} end # Internal implementation detail
abstract type AbstractHandle{H} <: _AbstractResolvable{AbstractHandle{H}} end
const AbstractResolvable{H <: AbstractHandle} = _AbstractResolvable{AbstractHandle{H}}
abstract type AbstractPath{H} <: AbstractResolvable{H} end
AbstractPath{H} forms a path subfamily.
All together, this allows us to have a handle type Foo that can be resolved from
a Bar path, where:
Foois an instance ofAbstractHandleBaris an instance ofAbstractPath- Both
FooandBarare anAbstractResolvable{Foo}
julia
struct Foo <: AbstractHandle{Foo} #= ... =# end
struct Bar <: AbstractPath{Foo} #= ... =# end
Foo <: AbstractHandle{Foo} <: AbstractResolvable{Foo}
Bar <: AbstractPath{Foo} <: AbstractResolvable{Foo}
Foo) and related path type (Bar) go together under the revised handle/path model.
If we then formalise the filesystem interface (which we will do), and write
generic fallback methods for AbstractPath arguments, package authors can write a
single method with ::AbstractResolvable{<:AbstractFilesystemHandle}
arguments and uniformly handle path and handle flavoured arguments.
To push package authors to use handles themselves, we can consider adding
Base.depwarn messages within the path fallback methods.
6.1.3. Incorporating the filesystem
We can easily satisfy 6a and 8 by making the filesystem an explicit part of the paths and handles. To implement methods specific to a particular filesystem, it follows that the kind of filesystem must be a type parameter of the path/handle.
We could directly add a filesystem type to the abstract path and handle types, but I appreciate the simplicity of the current forms, and (in my view) make them more inelegant. There is another route, that I will take here: defining a handle subtype that opts-in to the filesystem interface and holds the filesystem kind as a type parameter.
julia
abstract type AbstractFilesystem end
abstract type AbstractFileHandle{F<:AbstractFilesystem, H} <: AbstractHandle{H} end
It is expected that (in general) the filesystem instance will be carried around its path/handle types as an explicit field. The local filesystem is one exception to this, as it can be simply expressed as a singleton type.
This allows us to refer to any type that may be treated as a file with
AbstractResolvable{<:AbstractFileHandle}, which we can alias with
Path to help direct package authors to use it as the go-to type annotation for a
type that can be used as a file.
6.1.4. Per-system path types
To use a path type on multiple systems there are fundamentally three distinct approaches we can take: a. Have a single concrete type, that behaves as appropriate for the current platform. Modelling other platforms is not possible. b. Implement a type for each path, and require users/package authors to explicitly reason about what the current platform is. c. Implement types for each platform, and have a current platform type.
The first is convenient, the second flexible, and the third both. We'll take the third approach (the same as Python's Pathlib, Racket, and FilePathsBase.jl).
With our use of handles, and their inclusion as a path type parameter there's something else to consider: foreign system paths can only be reasoned about abstractly, not converted to a filesystem handle (leaving remote connections aside). As such, an alias for whatever path type is appropriate for the current system doesn't quite fit. Instead, we can define one extra path type that's a thin wrapper around the platform-appropriate type.
julia
struct Unhandleable <: AbstractHandle{Unhandleable} end
struct PosixPath <: AbstractPath{Unhandleable}
# ...
end
struct WindowsPath <: AbstractPath{Unhandleable}
# ...
end
struct LocalFileHandle <: AbstractHandle{LocalFileHandle}
# ...
end
struct LocalFilepath <: AbstractPath{LocalFileHandle}
path::(@static if Sys.iswindows() WindowsPath else PosixPath)
end
6.1.5. Allowing for a capability model with relative types
Once you have a handle, it is easy to see how you can want to resolve paths relative to a handle, for instance moving a file between two directories (as handles). This naturally appears in a few contexts, such as when listing the contents of a directory (which are provided as file names relative to a directory handle).
It is also worth noting that this is a construction in which we are able to perform path-based operations not based on an implicit context (the currently working directory) or an absolute path, but rather a "live" object that confers the ability to operate within it.
It should be no surprise that this model has been used as the basis for
replacing the ambient authority in filesystem operations with explicit
capabilities. We not only have the *at Posix operations, but also Capsicum
system calls like cap_rights_limit which is applied to a file descriptor to
restrict what can be done with it.
We can neatly fit this model under our existing paradigm as a new
AbstractResolvable subtype.
julia
struct RelativePath{H <: AbstractHandle, P <: AbstractPath{H}} <: AbstractResolvable{H}
parent::H
path::P
end
6.1.6. The final hierarchy
6.2. Code
https://code.tecosaur.net/tec/julia-basic-paths
If you'd like to make a PR etc. this is also now mirrored to GitHub: https://github.com/tecosaur/julia-basic-paths
I'm happy to take feedback in any form you're willing to give it. If easy/possible I like receiving a .patch with inline comments 🙂
6.3. Non-breaking changes
Ideally we'd use a time-travel machine to shoehorn this into Julia 1.0, but the second best time to add a path type to Julia is now.
Avoiding breaking changes means we can't remove paper cuts like
eachline(::String), but we can provide a better alternative, gradually adopt it,
and push for it to become the status quo in the long term.
6.4. Unresolved questions
- Is treating the drive + / as the path root on Windows good enough?
- Should we take this opportunity to copy FilePathsBase.jl /
FilePaths2.jl and provide more structured outputs to functions like
uperm? - Can we get away with eagerly normalising .. and requiring
realpathwhen you need to guard against symlink shenanigans? - Do we want to under-the-hood transform absolute Windows paths to verbatim-prefixed paths (\\?\), for long file name support?
6.4.1. Now resolved through community discussion
- Should joining two absolute paths return the latter absolute paths, or raise a
runtime warning/error?
- An error should be thrown
6.5. Mitigating Pain Points
While I like this set of design goals, they're ultimately a compromise between various concerns, and so produce some potential pain points. This should be mitigated as much as possible.
6.5.1. Symlinks and pseudoparents
With the separation of concerns of the design, path operations are very
predictable. However, if pseudopaths are present and/or symlinks need to be
accounted for, realpath will need to be called.
This is something that people using Path objects will simply need to remember to
do, and so we will sprinkle mention of this liberally into the documentation.
6.6. Comments
6.6.1. Windows UNC paths
Do we want to under-the-hood transform absolute Windows paths to verbatim-prefixed paths https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file#win32-file-namespaces (\\?\), for long file name support?
- cyhan
- NO, This problem should be solved when installing Julia, by setting a registry entry. https://github.com/JuliaLang/julia/issues/46450
- Timothy
- Sounds reasonable, I wonder how this might work with Julia static binaries though?
- Jānis Erdmanis
- Perhaps an alternative is to prepend `\\?\` if the path exceeds 260 characters until Windows slowly makes them default.
6.6.2. Windows path lengths
Windows: This is a superset of the restrictions for Path
- Jānis Erdmanis
- There is also notorious 260 character limit which could be also validated along those restrictions
- Timothy
- Ah yes, I like this idea. It does depend on a registry setting too though, depending on how the Julia runtime is installed…
- Jānis Erdmanis
- On the host system one can set a global variable CHECK_LONG_PATHS in the init block that checks the registry entry to decide upon whether path shall or shall not be enforced. I asked Claude and it gave reg query "HKLM\SYSTEM\CurrentControlSet\Control\FileSystem" /v LongPathsEnabled with which I checked on my VM to work. On the POSIX systems warning could be thrown nevertheless which could be disabled by explicitly setting CHECK_LONG_PATHS to false and the check removed in the next decade.
- Timothy
- Maybe the way to handle this could be to have an on-startup check for Windows systems which then adds the verbatim prefix if needed? The thing I don't like about warnings/errors here is that when a long path is produced via interpolation it makes runtime errors unpredictable, and TBH I'm not a fan of runtime errors in the first place with stuff this basic/fundamental (outside or errors from the filesystem itself/invalid paths).
- Jānis Erdmanis
- I think you have a point of runtime errors being unpredictable. When developing AppBundler I did encounter cases where my Julia application would run just fine in one and fail in another directory just because of long path limit. Adding a prefix automatically for long paths seems a much better idea.
6.6.3. Is / a drive root on Windows?
Is treating the drive + / as the path root on Windows good enough?
- cyhan
- This is called a mixed form in cygpath. I think it's good enough. Just need to provide some more conversion functions like cygpath
- Timothy
- Ta. Thanks for the second opinion.
6.6.4. Better to allow joining two absolute paths, or error?
Should joining two absolute paths return the latter absolute paths, or raise a runtime warning/error?
- cyhan
- I prefer throwing errors. We can always loosen restrictions later, but it's hard to tighten them.
- Timothy
- Initially I had mixed feelings, but I'm increasingly coming around to the idea of making combining multiple absolute paths disallowed without explicit handling.
6.6.5. Special handling of . and ..
Can we get away with eagerly normalising .. and requiring realpath when you need to guard against symlink shenanigans?
- Neven Sajko
- As I said on Discourse, special handling for .. (or .) seems like a bad idea. Just treat it like any other file name. So no normalization, IMO.
- Timothy
- That's a nice and simple answer, but I remain tempted by this approach because of how it simplifies/eliminates headaches like what the parent directory of a/b/. or a/b/.. is along with similar path-related questions/operations. I'm not sure how best to establish how viable this approach is (or isn't).
6.6.6. Joinpath as a basic path operation
From this, we can conceptualise a path as a list of direction segments, and arrive at a few fundamental operations
- Jānis Erdmanis
- Shouldn't
joinpathalso be part of this list? - Timothy
- Path concatenation might be worth adding to the generic API, I've
just tried to start with the absolute bare minimum. Miles recently made a
strong case for adding
childrenin Discourse. - Jānis Erdmanis
- I also like the idea of adding
children. Even the old tape filesystems could do such operation efficiently as the metadata that listed all files was written in one place.
7. Prior art
Because it's a lot of work to go through all of these implementations, review them, and then write about them, I've made use of some LLMs to help with the write-up. You may well pick this up in the writing style from here on (as well as a few stylistic inconsistencies that aren't hard to notice, but that I am far from sufficiently motivated to correct).
7.1. Summary
7.1.1. Paths
Across modern languages and libraries, filesystem paths are treated as structured names, not as generic text. The common design goal is to make path manipulation correct-by-construction and cross-platform, while keeping filesystem effects explicit and auditable.
7.1.1.1. Paths as distinct value types
Most ecosystems converge on a dedicated path type (or at least a dedicated
vocabulary surface) to prevent accidental mixing of "text" and "namespace
locations", and to concentrate path semantics in one place. Even where paths
remain strings (notably .NET and Node), there is a strong push toward
path-specific APIs (Path.Combine, path.join) and "path-like" protocols to reduce
ad-hoc string manipulation.
7.1.1.2. Lexical operations vs filesystem effects
A consistent boundary appears between lexical path operations (join, split,
parent, extension, normalization) and effectual filesystem interaction. Some
systems enforce this boundary by design (Rust keeps Path inert; filesystem
operations live in std::fs), while others expose it through naming/API placement
(C++17's lexical functions vs canonical, Python's "pure" vs "concrete" classes).
The important lesson is to avoid implying that lexical rewriting is equivalent
to semantic resolution, especially in the presence of symlinks.
7.1.1.4. Platform fidelity and encoding
Path models differ most in how faithfully they represent platform reality. Rust and Racket emphasise round-tripping arbitrary OS paths (including non-UTF8) via OS-string/byte representations, making string conversion fallible. Python and many high-level models assume Unicode strings but still acknowledge platform-specific semantics (POSIX vs Windows path rules). A robust design must be explicit about what is representable, and what conversions are lossy or fail.
7.1.1.5. A small core plus high-leverage conveniences
Successful path APIs provide a compact, predictable core
(join/parent/basename/extension/absolute/relative, segment iteration) plus a
small set of high-leverage conveniences (with_extension, with_name, suffixes,
strip_prefix, ancestors). This reduces bespoke string parsing and yields
consistent behaviour across the ecosystem.
7.1.1.6. Ambient context is the default, but explicitly isolating context is valuable
Most mainstream standard libraries interpret paths in an ambient OS namespace
(current working directory, mounted filesystems, process view). Where
alternative contexts are needed, ecosystems either (a) provide explicit context
objects (Java NIO.2 FileSystem, Go's fs.FS), or (b) introduce
capability-flavoured directory handles as the root of resolution (WASI, cap-std,
Zig's Dir). Even if not adopted everywhere, making "context" representable as a
value enables sandboxing, testability, and predictable semantics for non-local
backends.
7.1.1.7. Design takeaways
- Treat paths as structured names with a coherent API; avoid "strings plus conventions" wherever possible.
- Keep lexical manipulation separate from effectual filesystem interaction; do not conflate canonicalisation with handle acquisition.
- Make handles the unit of authority and the primary return type of "open"-style operations.
- Be explicit about platform semantics and encoding/round-tripping guarantees.
- Provide a small core of primitives and a curated set of composable convenience transforms.
- Keep ambient OS behaviour as the ergonomic default, while ensuring the design leaves room for explicit context/capability patterns where needed.
7.1.2. Virtual Filesystems
Across systems, a "virtual filesystem" abstraction is primarily about making filesystem context explicit and substitutable, so code can operate over different backends (OS, in-memory, archives, remote stores) and different views (rooted, overlaid, redirected) with minimal change. Designs vary mainly in what is made first-class/ (filesystem object vs path value vs handles), and in how strongly they treat authority/sandboxing as a core concern.
7.1.2.1. Filesystem objects as the unit of substitution
Many successful designs make the filesystem itself the injectable boundary: you pass an FS/FileSystem/afero.Fs/llvm::vfs::FileSystem value into code, and all operations go through that object. This centralises backend choice, credentials/configuration, and view-construction, and avoids hard-wiring the ambient OS namespace into libraries.
7.1.2.2. Minimal interfaces vs OS-shaped completeness
There's a recurring split between:
- Minimal, capability-friendly cores
- for example, Go io/fs's Open + optional extensions, which maximise implementability and composability, and
- Broad, os-shaped interfaces
- (Afero) that better support real applications (writes, renames, permissions) but are harder to implement consistently across heterogeneous backends.
This suggests a "small core + layered extensions" approach scales best.
7.1.2.3. Names are often stringly, even when context is explicit
Even sophisticated VFS systems frequently keep pathnames as strings (Go io/fs,
Afero, LLVM VFS, Linux syscalls, WASI path parameters). The safety and clarity
then comes from (a) a strict path-name contract (Go), (b) scheme-in-string
dispatch (fsspec), or (c) forcing resolution to be relative to explicit
context/handles (WASI, openat-style APIs). A first-class Path value type is
comparatively rare in VFS layers; it usually lives at the language level, not
the VFS layer.
7.1.2.4. Layering and "view construction" are high-leverage primitives
Powerful VFS designs tend to support composition :
- overlays/stacks (LLVM's
OverlayFileSystem, Afero'sCopyOnWriteFs) - rooted/sub views (Go's
fs.Sub,os.DirFS, Afero'sBasePathFs) - redirection/mapping (LLVM redirecting FS; fsspec protocol routing)
These primitives solve many practical problems (testing, reproducible builds, generated files, sandboxed plugins) without bespoke ad-hoc hooks.
7.1.2.5. Two-phase model: resolve names, then operate on handles
Across OS-level and capability-flavoured designs, the most stable conceptual split is:
- name lookup/resolution, which are effectual/stateful, and may depend on mounts/symlinks/caches, producing
- handles, in the form of file descriptors, fs.File, vfs::File , file-like objects as the authority-bearing resource reference.
7.1.2.6. Capability models are a distinct axis
Some VFS approaches treat sandboxing as an emergent pattern (pass a rooted FS object; don't call the ambient one), while others make it the default (WASI: only pre-opened directory capabilities; descriptor-relative operations; absolute paths disallowed). The lesson is that capability-oriented designs work best when the API shape forces context/authority to be explicit (directory-handle-centric *at operations), rather than relying on convention.
7.1.2.8. Design takeaways
- Make filesystem context substitutable and injectable; avoid baking "the OS filesystem" into library code.
- Prefer a small core interface with optional extensions; keep richer surfaces layered.
- Treat resolution as effectual lookup that yields handles/metadata, not "better paths".
- Provide composition primitives (rooting/sub, overlay, redirection) as first-class building blocks.
- If capability/sandboxing matters, encode it in the API shape (handle-relative operations), not just documentation.
- Be explicit about semantic guarantees; don't pretend disparate backends share POSIX behaviour.
7.2. Paths
7.2.1. Python's Pathlib
https://peps.python.org/pep-0428 https://peps.python.org/pep-0519
https://docs.python.org/3/library/pathlib.html
Python's pathlib is generally praised for offering an ergonomic way of handling filesystem paths. It makes paths first-class types, with a deliberate split between pure (lexical) path manipulation and concrete (I/O) paths that interact with the filesystem. Each kind of path is further divided into POSIX and Windows flavours, with a mostly-uniform interface.
Sample usage
Python
from pathlib import Path, PureWindowsPath
import os
# Pure, lexical construction (no I/O)
win = PureWindowsPath(r"C:\Users\TEC") / "project" / "data.csv"
# Concrete, effectful operations on the local OS filesystem
p = Path("data") / "results.csv"
text = p.read_text(encoding="utf-8") # opens/reads the file (I/O)
# Explicit demotion for interop with APIs expecting a filesystem path representation
os_path = os.fspath(p)
7.2.1.1. Pure and Concrete paths
Pathlib separates out purely conceptual and filesystem-grounded paths as pure and concrete paths.
Operating on pure paths does not involve any interaction with the filesystem, while concrete paths check for symlinks, resolve symlinks, and verify various path operations using the filesystem. This also makes the transition from operating on the path to interacting with the filesystem explicit. Note that this is not true path "resolution" in the sense that concrete paths are still string based, instead of obtaining a resource handle.
7.2.1.2. Posix and Windows paths
Pathlib provides per-platform path classes, and aliases Path/PurePath based on
the current platform. This preserves OS-specific semantics (drives/UNC,
separators, etc.), and allows for working with non-native paths when needed,
without forcing all callers to write per-platform code.
7.2.1.3. Not a string subclass
PEP 428 explicitly decides against deriving paths from str to avoid silent misuse.
7.2.1.4. Interoperability via path protocol
PEP 519 creates a path protocol (os.PathLike / __fspath__) so that "path
objects" can be accepted across the stdlib. Path objects can be demoted to
str/bytes for legacy APIs, while constructors can promote strings into path
objects.
7.2.1.5. Solid path API
The Pathlib API provides the basics (parent, parts, joinpath, home), and also a
decent collection of utilities on top:
suffixsuffixesstemwith_namewith_stemwith_suffixwith_segmentsfrom_urias_uri
Currently Julia covers the basics, but could probably do with some more convenience functions.
7.2.1.6. Tradeoffs
- Convenience and separation
- while having path manipulation and filesystem
interaction methods all with the same class would be convenient, it is seen as
more important to split the two, and provide
Path/PurePathaliases to make the split more manageable.
7.2.1.7. Limitations
- Single ambient filesystem context
- the host OS is always implicit
- No capability model
- there's no concept of authority, or restricted namespaces; access control is left to each calling context
7.2.2. C++17 <filesystem> library
https://en.cppreference.com/w/cpp/header/filesystem.html
https://en.cppreference.com/w/cpp/filesystem/path.html
https://en.cppreference.com/w/cpp/filesystem/path/lexically_normal
https://en.cppreference.com/w/cpp/filesystem/canonical.html
https://learn.microsoft.com/en-us/cpp/standard-library/path-class?view=msvc-170
C++17's filesystem library, based on the Boost library of the same name, introduces a dedicated std::filesystem::path value type for representing names in a filesystem namespace, plus a family of functions for effectual filesystem operations. The design intentionally keeps path manipulation largely lexical, while making "touch the filesystem" operations explicit via separate APIs.
std::filesystemSample usage
C++
#include <filesystem>
#include <fstream>
namespace fs = std::filesystem;
fs::path p = fs::path{"config"} / "app.toml"; // lexical: just builds a name
fs::path normalized = p.lexically_normal(); // lexical: no filesystem access
fs::path resolved = fs::weakly_canonical(p); // effectful: resolves existing prefix + symlinks
std::ifstream in(resolved); // the handle is the stream/FD, not fs::path
for (const fs::directory_entry& e : fs::directory_iterator(resolved.parent_path())) {
if (e.is_regular_file() && e.path().extension() == ".toml") {
// ...
}
}
7.2.2.1. Natively stored paths with a generic view
Paths store the "pathname" in the native format, but allow viewing the path in a generic (POSIX) format too. There are explicit functions to go between the two forms.
7.2.2.2. Separate lexical and filesystem queries
Lexical operations (e.g. normalisation, relative path construction) are
supported with a separate API from filesystem interaction (canonical /
weakly_canonical).
7.2.2.3. Exceptions and error-valued returns
Most effectual operations come in two forms: a exception throwing form, and a
overloads with std::error_code& that report failures out-of-band.
7.2.2.4. Tradeoffs
- Lexical purity vs. meaningful normalisation
- having both lexical and filesystem operations operate on the same type is convenient, but introduces a softer separation of intent, and means that knowledge about whether a path refers to a real filesystem resource or not must be thought about and managed by the programmer.
- Interoperative convenience vs. sharp edges
- implicit convertibility to/from std::basic_string makes adoption easy and incremental, but also blur intent and creates more opportunities for surprises in cross-platform code.
7.2.2.5. Limitations
- No capability model
- the approach is only concerned with modelling path names/transformations, and not the use of paths.
- Single ambient filesystem context
- there's no mechanism (or room to insert one) for different kinds of filesystems. Libraries must create parallel APIs.
7.2.3. Rust std::path (Path and PathBuf)
https://doc.rust-lang.org/std/path/index.html
https://doc.rust-lang.org/std/path/struct.Path.html
https://doc.rust-lang.org/std/path/struct.PathBuf.html
https://doc.rust-lang.org/std/fs/index.html
https://rust-lang.github.io/rfcs/0474-path-reform.html
Rust provides Path (borrowed, unsized) and PathBuf (owned) as first-class path
types, analogous to str/String, with representations designed to preserve
platform-native path encodings and semantics. Filesystem effects are performed
through std::fs, producing resource handles like
std::fs::File and metadata, rather than "rematerialising" paths as
handles.
Sample usage
rust
use std::fs;
use std::path::{Path, PathBuf};
let base: &Path = Path::new("data");
let mut p: PathBuf = base.join("results.csv"); // lexical name construction
p.set_extension("tsv"); // mutate owned path buffer
let meta = fs::metadata(&p)?; // effectful query via std::fs
for entry in fs::read_dir(base)? { // std::fs takes P: AsRef<Path>
let entry = entry?;
if entry.path().extension() == Some(OsStr::new("tsv")) {
// ...
}
}
7.2.3.1. Borrowed/owned split
Rust models paths like strings: Path is a borrowed view used pervasively in
APIs, while PathBuf is an owned, mutable buffer for constructing and editing
paths efficiently (push, pop, set_extension, etc.). This makes path-heavy code
ergonomic without forcing allocations at every boundary.
7.2.3.2. Wrapping around the native OS representation
Path/PathBuf are thin wrappers over OsStr/OsString, so they can represent
platform-native paths that are not valid Unicode text. Conversions to &str are
therefore fallible/optional, which forces callers to confront encoding rather
than silently corrupting or rejecting valid OS paths.
7.2.3.3. Comprehensive API for structure and transformation
The standard API covers the common "shape of a pathname" operations: iteration
by components, file_name / file_stem / extension, parent, prefix/suffix tests,
strip_prefix, joining, and targeted edits (with_extension, with_file_name, plus
mutable PathBuf setters). It is deliberately narrower than pathlib's high-level
conveniences (no standard home()/tilde expansion, globbing, recursive walking,
etc.), which are see-as-needed in ecosystem crates.
7.2.3.4. Hard separation between paths and filesystem operations
Rust keeps path values largely inert: opening a file, reading metadata,
canonicalising, iterating directories, etc. is primarily done via std::fs free
functions or types. This makes "touches the filesystem" sites stand out in code,
and ensures handles are explicit (std::fs::File being the canonical OS-backed
handle wrapper).
7.2.3.5. Interoperability via AsRef<Path> (promotion without a dedicated protocol type)
Most filesystem APIs accept P: AsRef<Path>, allowing callers to pass
&Path, PathBuf, &str, String, and other path-like inputs without a bespoke
path-protocol mechanism. This makes adoption incremental while still
standardising the "real" path vocabulary type at API boundaries.
7.2.3.6. Tradeoffs
- Ergonomics vs purity
- keeping most effects in
std::fssharpens the "names vs resources" distinction, but also means fewer "one object does everything" conveniences compared with pathlib. - Correctness vs string convenience
- non-Unicode-capable paths reduce footguns
on real systems, but push more code toward
OsStr/OsString-aware manipulation when doing text-like operations.
7.2.3.7. Limitations
- Single ambient filesystem context
- std assumes the host OS filesystem; multiple filesystem instances and VFS composition are not standardised in the core API.
- No capability model in std
- capability-oriented directory handles and sandbox-friendly "at-style" patterns are left to crates (notably outside std).
7.2.4. Racket paths
https://docs.racket-lang.org/reference/pathutils.html
https://docs.racket-lang.org/reference/Manipulating_Paths.html
https://docs.racket-lang.org/reference/file-ports.html
Racket treats paths as a distinct datatype, while allowing most filesystem APIs to accept either a path value or a string that is promoted to a path. Path values preserve an underlying byte-oriented representation (so not all paths are losslessly representable as strings) and path-manipulation utilities can operate under Unix/Windows conventions independent of the host platform.
Sample usage
Scheme
;; Path construction is lexical
(define p (build-path (current-directory) "data" "results.csv"))
;; "Pure-ish" normalization is available (no filesystem access)
(define p-lex (simplify-path p #f))
;; Filesystem-aware normalization can consult the filesystem (e.g. symlinks)
(define p-fs (simplify-path p))
;; Path -> handle (port); the port is the resource capability (backed by an OS fd)
(define in (open-input-file p))
(define txt (port->string in))
;; Interop / presentation: string conversion can be lossy
(define display (path->string p))
7.2.4.1. Dedicated path datatype with string interoperability
Filesystem procedures generally accept either a string or a path value,
promoting strings via string->path, while procedures that produce filesystem
paths return path values. This keeps "path-as-name" visible in values without
forcing an all-at-once ecosystem migration away from strings.
7.2.4.2. Byte-oriented representation and lossy string rendering
Paths round-trip through byte strings more faithfully than through Unicode
strings: path->string decodes using the platform's conventions and is explicitly
documented as unsuitable for lossless "convert to string, tweak, convert back"
workflows. This is a concrete example of separating display text from
filesystem name representation.
7.2.4.3. Convention-aware paths (Unix vs Windows) independent of host
Racket tracks a path's convention ("kind") and makes many non-effectual
procedures sensitive to that kind. Construction from bytes supports an explicit
'unix/'windows convention, enabling manipulation of (say) Windows paths on
Unix when no filesystem access is required.
7.2.4.4. Cleansing, lexical normalization, and filesystem-aware simplification
Racket distinguishes several tiers of "make this path nicer":
- Cleansing
- many primitives cleanse inputs (e.g. redundant separators) before
use;
cleanse-pathis explicitly non-effectual. - Filesystem-aware adjustment
resolve-pathcan dereference a single soft link, andsimplify-pathdefaults touse-filesystem? = #t, potentially consulting the filesystem and accounting for soft links when eliminating .. to preserve referential meaning.- Pure mode
simplify-pathwith#fperforms syntactic simplification without filesystem access (and can operate on paths for any platform).
7.2.4.5. Separation of names from resource handles
Racket's I/O APIs return ports (e.g. open-input-file) which are the
capability-bearing handles used for reading/writing; paths remain names that are
interpreted during the open. Ports are backed by OS-level descriptors
underneath.
7.2.4.6. Efficiency-oriented decomposition utilities
The API includes primitives designed to avoid allocation patterns that
arise from repeated splitting. For example, explode-path is documented
to run in time proportional to the path length, unlike iterative
split-path usage that allocates intermediate paths.
7.2.4.7. Tradeoffs
- Interoperability convenience vs. strictness
- accepting both strings and path values keeps APIs ergonomic, but weakens the type-level separation between "text" and "path name" compared to designs that require explicit promotion everywhere.
- Convenient defaults vs. explicit context
- some "pure" utilities
still consult ambient process state (e.g.
current-directoryas a default base), which is pragmatic but less explicit than a fully capability-oriented model.
7.2.4.8. Limitations
- Ambient filesystem context
- paths do not carry an explicit filesystem object/context; operations are fundamentally anchored to the host OS filesystem semantics.
- No first-class capability model for namespace restriction
- while ports are
handles, there is no standard "directory capability" pattern (à la
openat-style APIs) that makes authority over a subtree explicit in function signatures.
7.2.5. Common Lisp filepaths library
filepaths is a "modern and consistent filepath manipulation" library for Common Lisp which consolidates scattered pathname utilities, fills in commonly-missed operations, and renames them to be more predictable. It operates purely on names (CL pathnames / namestrings) and intentionally does not probe the filesystem.
Sample usage
Lisp
;; Construct and transform purely lexically (no I/O)
(let* ((p (p:join #p"/home/you/code" "common-lisp" "hello.lisp"))
(q (p:with-extension p "json")))
(values
p
q
(p:parent q)
(p:components q)
(p:to-string q)))
7.2.5.1. Standard pathname substrate
Common Lisp already has a first-class pathname object (with components like host/device/directory/name/type/version), and namestring is an implementation-defined textual rendering of a pathname. filepaths builds on this existing substrate rather than introducing a new path type.
7.2.5.2. Lexical-only API by design
filepaths explicitly focuses on structural/lexical operations (joins, component access, extension manipulation, structure tests) and avoids filesystem queries such as existence checks. This gives a sharp "names, not resources" boundary.
7.2.5.3. Predictable "modern" operations and naming
The library centers a set of operations that look very similar to what
developers expect from newer path APIs: join, parent, with-name, with-parent,
extension/with-extension/add-extension/drop-extension, and component
conversion (components, from-list).
7.2.5.4. Accepts either pathname or string
Nearly every function accepts either a pathname or a string, and provides
explicit "ensure"/conversion helpers (ensure-path, ensure-string, to-string,
from-string) to normalize inputs/outputs. This is a pragmatic interoperability story in a
Lisp ecosystem where APIs frequently accept "pathname designators".
7.2.5.5. Errors via conditions
For cases where nil is ambiguous, it signals dedicated conditions
(e.g. empty-path, no-filename, root-no-parent), which aligns with CL's condition
system for recoverable errors.
7.2.5.6. Tradeoffs
- Interoperative convenience vs. type clarity
- accepting both strings and path objects makes adoption easy, but allows "stringly-path" to remain pervasive, weakening the signaling value of path-typed APIs.
- Leverages CL pathnames vs. inherited complexity
- reusing pathname avoids ecosystem fragmentation, but also inherits long-standing complexity/quirks of CL pathname semantics and conversions.
7.2.5.7. Limitations
- Not a filesystem abstraction
- there is no VFS concept, no filesystem context object, and no capability/authority model—only lexical pathname manipulation.
- Portability of textual syntax is constrained by CL
- because namestring syntax is implementation-dependent (outside of logical pathnames), any string-based portability story depends on the underlying implementation's parsing/rendering choices.
7.2.6. Frames' algebraic path schema
https://www.oxinabox.net/2016/09/14/an-algebraic-structure-for-path-schema-take2.html
Frames proposes a minimal algebraic specification for "path schemas" (file paths, URLs, XPath, globs, etc.) by treating paths as structured names built from roots and relative components, and then separating that lexical structure from the effectual act of evaluating a path against a backing domain.
7.2.6.1. Minimal core: roots, a free monoid, and a faithful action
The model starts with a set of absolute roots \(A\), relative components \(R\), and a pathjoin operator ⋅ such that relative paths form a free monoid (\(R^*\)) and (\(R^*\)) acts faithfully on absolute roots, generating absolute paths (\(A^*\)). This aims to capture "paths as names" with as few primitives as possible, while still deriving most common operations.
I think that paths actually best fit an ordered monoid, since there's a sensible partial order.
7.2.6.2. Multiple roots as a first-class concern
The framework explicitly allows multiple absolute roots (e.g. POSIX "/" vs Windows drive roots and other namespaces), and notes that "zero absolute roots" is theoretically possible but makes evaluation ill-defined for "absolute" lookup.
7.2.6.3. Derived operations: parentdir, basename, root, parts, within
Given the core structure, the post derives familiar operations---parentdir and basename (tail/head), and then root, depth, parts, and within (a restricted "relative" that only works when one path is nested within the other). The emphasis is that these are lexical and don't require touching the backing system.
7.2.6.4. Evaluation is intentionally effectual and many-to-one
Resolution is modeled as an evaluation function (e: \(A^* \to \mathcal{P}(D)\)), mapping an absolute path name to a set of domain objects (to accommodate aliases, links, globs, XPath, etc.). The post distinguishes "MonoPath" schemas (0/1 object, like typical filesystem paths) from "MultiPath" schemas (0/many, like globs/XPath).
7.2.6.5. .. as a "pseudoparent" element and why it's hard
A key design lesson is that treating .. as an ordinary relative component breaks the free-monoid structure; instead the post introduces a special element (ϕ) defined via evaluation (intuitively, "append ϕ then evaluate equals evaluating the parent"). It then argues that POSIX .. semantics are not purely lexical in the presence of symlinks, motivating designs that avoid collapsing .. without filesystem access (and noting that some systems ban it outright).
7.2.6.6. Normalization and relative_to depend on ϕ
A normalization function norm is defined to remove ϕ where possible without changing evaluation, and relative_to(x,y) is defined using a common-prefix computation plus the necessary number of ϕ "up-steps". The post explicitly notes that proofs of the normalization/equivalence properties are non-trivial and not completed there.
7.2.6.7. Optional extensions: canonical names and directory-vs-file paths
Two extensions are sketched: (1) "canonical name" schemas where evaluation is injective (one object ↔ one name), and (2) splitting file paths vs directory paths to restrict which joins are permitted—at the cost of losing a simple monoid over all relative paths (while preserving a free monoid for directory-relative paths).
7.2.6.8. Tradeoffs
- The high level of generality (covering URLs/XPath/globs/filesystems uniformly) clarifies what is fundamental versus conventional, but it can be too abstract to settle concrete API questions without additional constraints.
- Introducing ϕ enables familiar operations like relative_to, but it forces a careful split between lexical structure and effectual semantics—highlighting exactly where "stringy" normalization becomes unsound.
7.2.6.9. Limitations
- This is a theoretical specification rather than an implementation guide; many operational details (errors, permissions, encodings, etc.) are out of scope.
- ϕ/.. cannot be made fully POSIX-faithful without consulting the filesystem (symlink interaction), and some schemas (notably multipaths like globs) may not admit any ϕ-like element at all.
- The "evaluation" function models name → object(s), but does not model handles/capabilities; it stops at identifying resources rather than representing authority to operate on them.
7.2.7. Zig's path APIs
https://ziglang.org/documentation/master/std/#std.fs
https://ziglang.org/documentation/master/std/#std.fs.Dir
https://ziglang.org/documentation/master/std/#std.fs.path
Zig largely treats paths as byte slices ([]const u8) plus a set of
lexical utilities (std.fs.path). Effectual filesystem operations are separated
into std.fs, and are designed to be used primarily via open directory handles
(std.fs.Dir) and relative paths, rather than "stringly" absolute-path APIs.
Sample usage
zig
// Lexical composition (allocates for convenience)
const rel = try std.fs.path.join(allocator, &.{ "reports", "2026-01.csv" });
defer allocator.free(rel);
// Directory-handle-centric I/O (File/Dir are the actual resource handles)
var root = try std.fs.cwd().openDir("data", .{ .iterate = true });
defer root.close();
var f = try root.openFile(rel, .{ .mode = .read_only });
defer f.close();
7.2.7.1. Paths are functions over slices, not a first-class "Path" type
Rather than a dedicated path value type, Zig centralises path manipulation as
functions that operate on []const u8 and expose platform-specific
constants (e.g. directory separator and PATH-list delimiter).
7.2.7.2. Dir-centric I/O (directory handles as capability-like authority)
Filesystem operations are intended to be performed relative to a Dir
handle (an OS-backed resource), which can reduce TOCTOU hazards and
makes "what subtree do we have authority over?" more explicit in APIs
and call graphs. The presence of *Absolute convenience functions is
increasingly treated as legacy/avoidable, since many are thin wrappers
over cwd().* calls.
7.2.7.3. Explicit Windows-wide variants alongside UTF-8/byte-slice APIs
Many APIs accept []const u8, but Windows-specific variants exist that
take wide strings (WTF-16), reflecting the platform's native calling conventions
and making encoding/interoperability concerns explicit at the API boundary.
7.2.7.4. Tradeoffs
- Simplicity and performance
- treating paths as slices avoids object overhead and keeps path manipulation lightweight, but provides less structural guidance than a dedicated Path type.
- Handle-first ergonomics
Dir-relative operations sharpen authority and robustness, but can feel heavier for simple scripts and push more users toward "plumbing" directory handles through APIs.- Platform realism
- exposing wide-string Windows variants improves correctness/interoperability, but increases surface area and multiplies "which encoding do I use?" decisions.
7.2.7.5. Limitations
- Single ambient filesystem context
cwd()is the default anchor for many operations, and the model assumes the host OS filesystem semantics as the baseline context.- No datatype distinction between paths and strings
- paths remain names (byte
sequences); "handle-ness" only appears once a
Dir/Fileis opened.
7.2.8. Node.js path and fs
https://nodejs.org/api/path.html
https://nodejs.org/api/fs.html
https://nodejs.org/api/url.html
Node splits lexical pathname manipulation (path) from effectual filesystem
interaction (fs), but paths themselves are primarily represented as plain
strings (with some fs APIs also accepting Buffer or file: URLs). The net effect
is a clear separation of "build a name" vs "touch the filesystem", without
introducing a dedicated path value type.
Sample usage
Javascript
// Lexical path operations (no I/O)
const p = path.join("data", "results.csv");
const parts = path.parse(p); // { dir, base, name, ext, ... }
const abs = path.resolve(p); // still just a string
// Effectful operations (I/O) yield handles
const fh = await fs.promises.open(p, "r");
try {
const text = await fh.readFile({ encoding: "utf8" });
} finally {
await fh.close();
}
// URL interop when needed
const fileUrl = url.pathToFileURL(abs);
const p2 = url.fileURLToPath(fileUrl);
7.2.8.1. Stringly paths and "PathLike" inputs
path operates over strings and returns strings. In fs, string paths
are interpreted as UTF-8 sequences naming absolute/relative filenames,
and relative paths are resolved against process.cwd(). Many fs APIs
also accept Buffer and (for file:) WHATWG URL objects, which makes
interoperability convenient but keeps the "path as text" boundary relatively
soft.
7.2.8.2. Platform-specific semantics with explicit posix/win32 variants
The default path behavior follows the host platform, while
path.posix and path.win32 provide explicit access to the other
platform's parsing/joining rules. This allows cross-platform
manipulation without changing the underlying representation (still
strings).
7.2.8.3. Handles as file descriptors and FileHandle
Resource access is mediated by OS-backed handles: the promises API
exposes a FileHandle object with explicit close(), while other APIs
expose numeric file descriptors. This matches the "handle is authority"
story operationally, even though it is not reflected in a
capability-oriented path resolution model (paths remain globally
interpretable strings).
7.2.8.4. Lexical normalization vs filesystem-aware canonicalization
Node's path functions are purely lexical; canonicalization and symlink
resolution live in fs (e.g. realpath). This preserves a practical
"names vs effects" separation, but with no distinct type-level boundary
between lexical and resolved forms.
7.2.8.5. Tradeoffs
- Simplicity and interoperability
- strings keep the surface area small and integrate naturally with the JS ecosystem, but do not prevent accidental mixing of "text" and "path".
- Cross-platform control vs ergonomics
- explicit
path.posix/path.win32enables portable tooling, but correctness remains a caller responsibility because everything is still representationally a string. - Flexible inputs vs clarity
- accepting
Buffer/URLis pragmatic, but further blurs the conceptual model (multiple "path-like" carriers with different invariants).
7.2.8.6. Limitations
- Single ambient filesystem context
- fs operations implicitly target the
process-visible OS filesystem, with relative paths interpreted via
process.cwd(). - No capability model
- while handles exist (
FileHandle/fd), the dominant API surface remains ambient (global path resolution rather than resolution relative to an explicit directory capability). - No first-class path value
- the model cannot leverage types to encode invariants (absolute vs relative, platform flavour, lexical vs canonical) beyond convention and runtime checks.
7.2.9. .NET's System.IO
https://learn.microsoft.com/en-us/dotnet/api/system.io.path?view=net-10.0
https://learn.microsoft.com/en-us/dotnet/api/system.io.path.combine?view=net-10.0
https://learn.microsoft.com/en-us/dotnet/api/system.io.path.join?view=net-10.0
https://learn.microsoft.com/en-us/dotnet/api/system.io.path.getfullpath?view=net-10.0
https://learn.microsoft.com/en-us/dotnet/api/system.io.file.openhandle?view=net-10.0
https://learn.microsoft.com/en-us/aspnet/core/fundamentals/file-providers?view=aspnetcore-10.0
.NET's core System.IO APIs treat paths primarily as string values, with lexical
manipulation provided by the static Path utility class and effectual operations
provided by File/Directory and stream/handle types. Stable authority over a
resource is represented by streams/OS-handle wrappers created by opening a path,
rather than by the path value itself.
Sample usage
csharp
var p = Path.Combine("data", "results.csv"); // lexical construction
var full = Path.GetFullPath(p); // resolves against current directory
var text = File.ReadAllText(full); // effectful: opens + reads
using var h = File.OpenHandle(full); // explicit OS-handle wrapper
using var s = new FileStream(h, FileAccess.Read);
7.2.9.1. Path as a purely-lexical utility surface
System.IO.Path is a "path algebra" of string-in/string-out helpers
(join, split, query), while filesystem effects live in other types. This
keeps naming operations distinct at the API level, but does not create a
first-class path value type.
7.2.9.2. Joining semantics: Combine vs Join and rooted segments
Path.Combine follows OS-like semantics: if any argument after the
first is rooted, earlier components are discarded. Path.Join
concatenates more mechanically (preserving duplicate separators), with
normalization typically deferred to GetFullPath.
7.2.9.3. Ambient resolution via current directory (and drive rules on Windows)
Path.GetFullPath resolves relative inputs using ambient process state
(current directory; and, on Windows, drive-relative conventions). A
base-path overload exists to avoid depending on ambient state when
determinism matters.
7.2.9.4. Handles are explicit (and increasingly first-class)
The canonical "resource handle" is a stream (e.g. FileStream), and
modern .NET also exposes File.OpenHandle returning a SafeFileHandle
directly, making the "name → handle" boundary explicit when desired.
7.2.9.5. VFS-style abstractions exist, but outside System.IO
While System.IO itself is OS-filesystem-oriented, .NET includes other,
scoped abstractions for non-physical or restricted views—most notably
ASP.NET Core's IFileProvider (with physical, embedded, and composite
providers). This is not the general-purpose System.IO model, and is
intentionally narrower in scope.
7.2.9.6. Tradeoffs
- Stringly paths
- maximizes interoperability and keeps the core surface small, but conflates text with namespace locations and limits type-driven correctness and dispatch.
- Convenient OS-like joins
Combine's rooted-path rule is pragmatic, but can produce surprising results if rootedness appears unexpectedly in inputs.- Exceptions as the primary error channel
- simplifies common-case use, but makes "probe" patterns more awkward than explicit error-valued returns.
7.2.9.7. Limitations
- No first-class path value type in the BCL
- path semantics remain "strings with conventions", despite a rich helper API.
- No first-class filesystem-context object in System.IO
- there is no standard
FileSystem value you pass around to select a backend or restrict authority;
alternate filesystem views are provided via separate subsystems
(e.g.
IFileProvider) rather than integrated into the core path+fs model.
7.2.10. FilePathsBase.jl
https://github.com/rofinn/FilePathsBase.jl
https://rofinn.github.io/FilePathsBase.jl/stable
FilePathsBase.jl is a Julia ecosystem attempt at a first-class path vocabulary
type (AbstractPath) with platform-aware concrete path types (e.g. POSIX/Windows)
and broad integration with Julia's existing filesystem functions via method
extensions. It treats paths as structured values (not strings), while still
ultimately operating against the ambient host filesystem for the default "system
path" types. It gets a lot of things right, but suffers from a few fatal flaws
(such as type instability).
Sample usage
julia
p = p"data" / "results.csv" # path literal + /-join
write(p, "hello\n") # effectful filesystem use via methods/extensions
txt = read(p, String)
ext = extension(p) # "csv"
parent = parent(p)
here = @__PATH__ # path-valued analogue of @__FILE__
7.2.10.1. AbstractPath as a vocabulary type (not AbstractString)
A central design choice is making paths distinct from strings, rather
than a string subtype. This forces explicit conversions at boundaries
and reduces accidental misuse (e.g. treating text as a filesystem name
or vice-versa). The package also preserves * for string concatenation
and uses / for path joining.
7.2.10.2. Platform-specific path types with a "system path" default
FilePathsBase provides separate path types for differing platform semantics (POSIX vs Windows) and a "system" default intended to match the running platform, analogous to other ecosystems' platform-dispatched path aliases.
7.2.10.3. Structured representation and a "path interface"
Paths are modelled as structured values (conceptually segments plus
platform rules). The docs define an interface for implementing new
AbstractPath subtypes, including operations to access components and
to support common filesystem behaviours. This interface is intended to
allow alternate path kinds beyond the local OS, even if the default
types map to the ambient host filesystem.
7.2.10.4. Integration via method extension over Base.Filesystem
Rather than introducing a new filesystem context object, the package
primarily integrates by extending existing Base.Filesystem-style
operations to accept AbstractPath. This makes the API feel "native"
when it's in scope, but relies on broad method coverage and consistent
adoption.
7.2.10.5. Tradeoffs
- Ergonomics vs. "ambient by default"
- the default system path types implicitly target the process-visible host filesystem, which is convenient but keeps the core model tied to ambient context.
- Operator overloading vs. local surprises
- using / for joins is ergonomic, but it must coexist with division and with user expectations about what operators mean in Julia codebases.
- External package vs. ecosystem coherence
- because this lives outside Base, interoperability depends on adoption and on the completeness of method extensions across Base/stdlibs and third-party packages.
7.2.10.6. Limitations
- Not a Base-level abstraction
- it cannot enforce a ubiquitous "paths are paths" vocabulary across the ecosystem, so friction remains at API boundaries (conversion, missing overloads, mixed conventions).
- No explicit capability or filesystem context object
- authority and context remain largely implicit (host filesystem, process CWD), rather than being represented as explicit handles/capabilities.
- Type instability
- using
Tuple{Vararg{String}}for segments makes access/modification O(1), but makes any operations that require the full path O(depth) which isn't great.
7.2.11. FilePaths2.jl
An experimental rethink of typed paths in Julia from 2023, motivated by making path objects usable for non-local backends (notably S3) without implicitly triggering expensive "update" operations. It treats every path as a tree node (via the AbstractTrees interface) and imposes strict rules on what is inferable from path strings and when remote calls are allowed.
7.2.11.1. All paths are tree nodes
All path types are required to implement the AbstractTrees.jl
interface, so that generic traversal utilities (e.g. walkpath) can be
expressed in terms of standard tree algorithms rather than
re-implemented per path type. This frames "a path is a node in a
namespace tree" as a semantic guarantee, not just a metaphor.
This is complicated by symlinks, which is not addressed in the current design of FilePaths2.jl
7.2.11.2. Strict semantics about what can be inferred from strings
FilePaths2 leans on the fact that "complete" paths can be described by
strings from a root, and uses a PathSpec wrapper to support
purely-parsed operations like determining parents and testing
ancestry/antecedence relationships. A key consequence is that path
strings used to construct path objects must be absolute, or must refer
to an existing resource so the absolute path can be inferred.
7.2.11.3. Disallowing relative paths as a semantic constraint
Relative paths are treated as a "pun": they depend on ambient global state (e.g. a current directory) which may be ill-defined or nonsensical for remote backends, and they weaken what can be inferred (e.g. parent relationships). Relative forms may still exist as constructors or views, but the path object model aims to be absolute/anchored.
7.2.11.4. Explicit control of remote calls
To avoid accidental network traffic and to make the "reasoning footprint" of
operations clear, only a small set of functions are permitted to perform remote
calls, and these must accept an update keyword (even if update=false can still
error when the object lacks enough information). AbstractTrees.children,
readdir, walkpath, ispath, isfile, and isdir are listed (with an explicit note
the set may be incomplete).
7.2.11.5. Treating key-value stores as trees by construction
For S3-like systems (key-value stores that "masquerade" as filesystems),
the design asks: "what tree can be constructed from only what the API
provides?". The proof-of-concept infers tree structure solely from key
strings, which yields meaningful definitions for questions like isdir
(e.g. "is this a strict ancestor of some leaf?") while acknowledging
constraints (e.g. no truly empty directories).
7.2.11.6. Tradeoffs
- Stronger invariants vs. friction
- requiring absolute/anchored paths and restricting inference rules improves semantic clarity (especially for remote backends), but rejects common local patterns (relative paths) and may require more explicit anchoring in user code.
- Cost transparency vs. ergonomic opacity
- concentrating remote calls in a
small set of "update-permitted" functions makes costs auditable, but can
surprise users when seemingly simple queries error unless
update=trueis provided. - Tree-first abstraction vs. backend mismatch
- forcing a tree model onto S3 enables generic tooling, but necessarily bakes in approximations (e.g. "directories" inferred from key prefixes) that are not native to the store.
7.2.11.7. Limitations
- Incomplete: the project is a prototype, not a complete alternative to FilePathsBase.jl
7.3. Virtual Filesystems
7.3.1. Java's NIO.2
https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/file/Path.html
https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/file/FileSystem.html
https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/nio/file/spi/FileSystemProvider.html
https://jcp.org/en/jsr/detail?id=203
https://docs.oracle.com/javase/8/docs/technotes/guides/io/fsp/zipfilesystemprovider.html
Java NIO.2 makes filesystem context explicit: a Path is a name that belongs to a
particular FileSystem, and filesystem implementations are pluggable via a
provider SPI. Effectual operations are largely performed through
Files/channels/streams, so "resolution" typically yields a handle
(channel/stream), while "canonicalisation" yields another Path value.
Sample usage
Java
import java.nio.file.*;
import java.nio.channels.SeekableByteChannel;
import java.net.URI;
import java.util.Map;
Path p = Path.of("config", "app.toml"); // name on the default filesystem
// Opening yields a handle (channel/stream), not a "resolved path handle"
try (SeekableByteChannel ch = Files.newByteChannel(p, StandardOpenOption.READ)) {
// read from ch...
}
// Built-in virtual filesystem: treat a zip/jar as its own FileSystem
URI zipUri = URI.create("jar:file:/tmp/data.zip");
try (FileSystem zipfs = FileSystems.newFileSystem(zipUri, Map.of("create", "true"))) {
Path inZip = zipfs.getPath("/reports/2025.csv");
Files.copy(inZip, Path.of("/tmp/out.csv"), StandardCopyOption.REPLACE_EXISTING);
}
7.3.1.1. Path is bound to a filesystem instance
A Path is not just syntax; it is associated with a specific FileSystem (via
Path.getFileSystem()), and path operations are defined relative to that
filesystem's rules. This enables multiple coexisting filesystem namespaces in
one process without needing a "scheme-in-string" convention.
7.3.1.2. Filesystem as a pluggable service-provider interface
NIO.2 standardizes a provider model (FileSystemProvider) so new filesystems can
be registered and constructed (commonly via URIs and environment maps). This is
the key VFS "insertion point": alternate backends can provide their own Path and
FileSystem implementations while reusing the same surface API.
7.3.1.3. Separation of names from effectual operations (mostly)
The dominant pattern is: Path represents names; effectual operations live in
Files and in opened handles (streams/channels). Even "real path" operations like
toRealPath are effectual but still return a Path (a canonical name), not a
resource handle.
7.3.1.4. Built-in "zipfs" as concrete VFS prior art
The ZIP filesystem provider demonstrates the model end-to-end: each zip/JAR is a
distinct FileSystem, and you operate inside it with ordinary Path and Files calls
(copy, move, attributes), with provider-specific configuration carried via the
environment map.
7.3.1.5. Capability-flavoured hooks exist, but are not the default
Java includes SecureDirectoryStream , which supports directory-relative
operations designed to reduce TOCTOU races (an "openat-like" capability
pattern), but most everyday APIs still center the ambient default filesystem and
directory.
7.3.1.6. Limitations
- Ambient defaults remain central
Path.of(...)/Paths.get(...)and many idioms assume the default filesystem and process-wide working directory, so the "explicit FS" model is often bypassed in typical code.- Semantics leak across providers
- cross-filesystem operations exist, but
metadata, atomicity, link handling, and permission models can vary by
provider, so portability can require extra care beyond just using
Path.
7.3.1.7. Tradeoffs
- Power vs. complexity
- the SPI enables serious extensibility (archives,
custom backends), but introduces lifecycle concerns (
FileSystemcan beCloseable) and a larger conceptual surface area than "OS filesystem only". - "Path carries context" vs. ergonomic ambient usage
- binding
PathtoFileSystemis a clean model for multiple backends, but the ecosystem still needs strong conventions to avoid silently falling back to the default filesystem when a capability/context-driven style is desired.
7.3.2. Python's fsspec
fsspec is a de-facto standard interface for "filesystem-like" backends in the
Python data ecosystem (local FS, S3/GCS/Azure, HTTP, archives, in-memory). It is
intentionally filesystem-centric : paths are usually plain strings (often
URI-like), and filesystem instances provide a uniform API (open, ls, rm, cp, …)
which returns Python file-like objects for actual I/O.
Sample usage
Python
import fsspec
# Scheme-in-string dispatch: build a filesystem instance from a protocol
fs = fsspec.filesystem("s3", anon=False) # requires s3fs installed
# Uniform filesystem operations
paths = fs.glob("s3://my-bucket/data/*.parquet")
# Open returns a file-like object (handle-ish in Python terms)
with fs.open(paths[0], "rb") as f:
chunk = f.read(1024)
# Convenience entrypoint that parses the URL and creates the FS implicitly
with fsspec.open("s3://my-bucket/data/file.csv", mode="rt", encoding="utf-8") as f:
text = f.readline()
7.3.2.1. Protocol-in-string dispatch and a central registry
The main ergonomic trick is that filesystem choice is often encoded in the string via a protocol (e.g. s3://…, gcs://…), and fsspec uses a registry to map protocol → filesystem implementation, so callers can stay backend-agnostic. This makes it easy for libraries to accept "a path" and let fsspec pick the right backend.
7.3.2.2. Filesystem instances are the context boundary
Rather than having a structured Path object carry context, fsspec typically
places context on the filesystem instance: credentials, configuration, caching
policy, async settings, etc. This makes the FS object the natural injection
point for testability and "restricted views" (by handing out a preconfigured FS
instance).
7.3.2.3. OpenFile delays engaging resources
High-level entrypoints like fsspec.open / open_files produce an OpenFile wrapper
where the actual low-level file object is created only when entering a context
manager. This supports serialization/distribution patterns (common in
distributed compute) without holding live connections in the object being passed
around.
7.3.2.4. A minimal common denominator with extensible "optional" features
At its core, fsspec defines an abstract filesystem base (AbstractFileSystem)
and expects implementations to provide a coherent set of file and directory
operations. Beyond the core, it adds higher-level conveniences (globbing,
mapping interfaces, caching layers, wrappers such as compression/text-mode
handling) that can be applied across many backends.
7.3.2.5. Limitations
- Stringly-typed names
- paths are usually plain strings/URIs rather than structured path values, so lexical path manipulation and cross-platform semantics largely remain conventions or backend-specific behaviour.
- Semantic variance across backends
- a uniform API can mask important differences (e.g. "directories" on object stores, eventual consistency, atomicity guarantees), so correctness sometimes requires backend knowledge.
- No built-in capability model
- while passing filesystem objects can approximate contextual restriction, the design does not make "authority" an explicit, typed part of the interface in the way WASI/cap-std style APIs do.
7.3.2.6. Tradeoffs
- Very low adoption barrier vs. weaker invariants
- scheme-based dispatch +
duck-typed file-like objects makes integration easy for the ecosystem, but
gives up the stronger guarantees a first-class
Pathvalue type and explicit resolution model can provide. - One surface API vs. leaky abstraction
- fsspec succeeds by making diverse storage look "filesystem-ish", but the more "non-filesystem" a backend is, the more likely users encounter surprising edge cases that the API cannot fully abstract away.
7.3.3. Go's io/fs
https://pkg.go.dev/io/fs
https://pkg.go.dev/os#DirFS
https://pkg.go.dev/embed
Go's io/fs defines a minimal, read-only filesystem abstraction (fs.FS) plus a
small suite of generic consumers (fs.ReadFile, fs.WalkDir, templating ParseFS,
http.FS, etc.). It's designed for dependency injection and "VFS by interface",
while keeping effects explicit: Open(name) returns an fs.File (a closeable
resource/handle).
Sample usage
go
package main
import (
"fmt"
"io/fs"
"log"
"os"
"path/filepath"
)
// Root a view of the OS filesystem at a directory.
func main() {
fsys := os.DirFS(filepath.Join(".", "assets"))
// Generic, FS-agnostic consumption.
b, err := fs.ReadFile(fsys, "config/app.toml")
if err != nil {
log.Fatal(err)
}
fmt.Println(string(b))
// Walk the tree (portable slash-separated names).
err = fs.WalkDir(fsys, ".", func(p string, d fs.DirEntry, err error) error {
if err != nil {
return err
}
if !d.IsDir() {
fmt.Println(p)
}
return nil
})
if err != nil {
log.Fatal(err)
}
}
7.3.3.1. Minimal core interface, optional "capability" interfaces
fs.FS is intentionally just Open(name) (fs.File, error) , with optional
extension interfaces (ReadFileFS, ReadDirFS, StatFS , etc.) that consumers can
detect for more efficient implementations. The result is a low barrier to
implementing a filesystem backend, while still allowing richer behaviour when
available.
7.3.3.2. A portable path-name contract
Unlike OS-native paths, io/fs path names are defined to be unrooted, UTF-8, slash-separated, and to forbid .., ., empty elements, and leading/trailing slashes (except "." as the root). This standardises traversal semantics across platforms and backends, and makes consumers simpler and more predictable.
7.3.3.3. Context objects as an injection point
os.DirFS(dir) produces an fs.FS rooted at a directory, letting libraries accept
"a filesystem" rather than reaching for ambient OS state. This is a practical,
lightweight step toward capability-style design (authority comes from which FS
value you were given), even though it's not a full security boundary by default.
7.3.3.4. Sub-filesystems and compositional wrappers
fs.Sub(fsys, dir) derives a subtree view. If the backend supports SubFS, it can
implement this efficiently; otherwise fs.Sub provides a wrapper that rewrites
names by prefixing dir . This encourages layering (e.g., "rooted view", "overlay
view") without requiring a large inheritance hierarchy.
7.3.3.5. "Producers" and "consumers" across the standard library
Go 1.16 explicitly positioned io/fs as a shared boundary: producers include
embed.FS, zip.Reader, and os.DirFS; consumers include http.FS and ParseFS for
templates. That standard-library buy-in is a large part of why the abstraction
is widely usable.
7.3.3.6. Limitations
- Read-only core
- the central abstraction does not model mutation (create/write/rename), so write-capable VFS designs need additional interfaces or a parallel API surface.
- Stringly path names
- the interface standardises name syntax, but doesn't
introduce a first-class path value type; correctness relies on conventions
plus
fs.ValidPath. - Not automatically a sandbox
os.DirFSexplicitly warns it is not a chroot-style security mechanism (symlinks can escape), and even the meaning of a relativeDirFS("prefix")can be affected by laterChdir.
7.3.3.7. Tradeoffs
- Portability vs expressiveness
- forbidding rooted paths and .. makes traversal safer and backend-neutral, but diverges from "native path" expectations and pushes some concerns to adapters at the boundary.
- Minimalism vs completeness
- the tiny
Open-only core made adoption easy (lots of implementers), but leaves substantial surface area (writing, atomic renames, permissions, links) outside the unifying abstraction.
7.3.4. Go's afero
https://github.com/spf13/afero
https://pkg.go.dev/github.com/spf13/afero
Afero is a filesystem abstraction library for Go that provides an afero.Fs
interface as a drop-in, os-shaped API, plus a collection of concrete backends
(OS, in-memory, archives, remote) and composable wrappers (copy-on-write
overlays, caching layers, base-path "jails", etc.).
Sample usage
go
package main
import (
"log"
"github.com/spf13/afero"
)
func LoadConfig(fs afero.Fs, path string) ([]byte, error) {
return afero.ReadFile(fs, path)
}
func main() {
// Production: OS-backed FS.
osfs := afero.NewOsFs()
// "Jail" the app to a subtree (library-level chroot).
appfs := afero.NewBasePathFs(osfs, "/var/myapp")
// Overlay writes into memory (reads fall through to base).
sandbox := afero.NewCopyOnWriteFs(afero.NewReadOnlyFs(appfs), afero.NewMemMapFs())
if err := afero.WriteFile(sandbox, "config.json", []byte(`{"feature":true}`), 0o644); err != nil {
log.Fatal(err)
}
b, _ := LoadConfig(sandbox, "config.json")
_ = b
}
7.3.4.1. Filesystem-as-parameter, not global
The central move is to pass an afero.Fs into application/library code instead of
calling os.* directly, so the caller controls the backing store (OS in
production; MemMapFs in tests; wrappers in sandboxes).
7.3.4.2. OS-shaped, read/write interface
Afero intentionally mirrors the standard os surface, aiming for incremental adoption and broad coverage of common filesystem mutations (create/write/rename/remove), not just read-only access.
7.3.4.3. Composition as a first-class design tool
Afero leans hard into wrapping and layering filesystems to construct behaviour:
CopyOnWriteFs for overlays, CacheOnReadFs for read caching, BasePathFs for
subtree restriction, plus ReadOnlyFs and other helpers.
7.3.4.4. Interoperability with io/fs
Afero positions itself as complementary to Go's io/fs: you can wrap Afero
backends to satisfy fs.FS (e.g. via afero.NewIOFS), and generally use io/fs
where you only need read-only standard-library semantics.
7.3.4.5. Limitations
- Stringly "paths"
- call sites still pass
stringnames; Afero doesn't introduce a structured path value type, so lexical/path semantics remain the caller's responsibility (and may differ across backends). - Library-level confinement
- wrappers like
BasePathFsprovide a "jail" within the abstraction, but don't prevent code from bypassing the model by callingos.*directly (so this is not an enforcement mechanism by itself). - Backend fidelity varies
- many "non-POSIX" backends exist (cloud/streaming/etc.) and may not support the full set of POSIX-ish expectations (permissions, seeking, listings, etc.) uniformly.
7.3.4.6. Tradeoffs
- Broad interface vs implementability
- compared to io/fs 's deliberately minimal surface, Afero's richer, mutable API is more demanding for backend implementers — but is the reason it supports write-heavy apps, richer tests, and layered behaviours.
- Powerful composition vs semantic sharp edges
- overlay/caching layers are extremely useful, but can blur or distort semantics (e.g., error modes, metadata, atomicity) relative to a single "real" filesystem, especially across heterogeneous backends.
7.3.5. LLVM VFS
https://llvm.org/doxygen/classllvm_1_1vfs_1_1FileSystem.html
https://llvm.org/doxygen/VirtualFileSystem_8h.html
https://llvm.org/doxygen/classllvm_1_1vfs_1_1OverlayFileSystem.html
https://llvm.org/doxygen/classllvm_1_1vfs_1_1RedirectingFileSystem.html
https://llvm.org/doxygen/classllvm_1_1vfs_1_1InMemoryFileSystem.html
https://llvm.org/doxygen/classllvm_1_1vfs_1_1File.html
LLVM's VFS is a filesystem-context abstraction used throughout the LLVM/Clang tooling stack to decouple clients from the host OS filesystem. Instead of introducing a dedicated path type, it keeps "paths" as strings and makes the filesystem itself an injectable object whose operations resolve names into metadata and file handles—enabling in-memory files, overlays, and redirecting "views" of a tree for toolchain use-cases.
7.3.5.1. Filesystem object as the primary abstraction
llvm::vfs::FileSystem is an abstract interface representing a namespace plus
resolution rules (status, directory iteration, opening files). This makes the
filesystem the unit of substitution (real FS vs overlays vs in-memory), rather
than trying to encode "which filesystem?" into the path value itself.
7.3.5.2. Clear separation of names, metadata, and handles
The VFS API treats "paths" as names (string parameters), with explicit,
effectual operations producing either metadata (Status) or resource handles (
vfs::File). The vfs::File object is the authoritative handle for subsequent
reads/buffering/close, not the input path.
7.3.5.3. Composable layering via OverlayFileSystem
OverlayFileSystem composes multiple FileSystem instances into a stacked view,
allowing higher layers to shadow files/directories from lower layers (a common
compiler/tooling need for "overlaying" headers, SDKs, generated files, etc.).
7.3.5.4. Declarative remapping via RedirectingFileSystem
RedirectingFileSystem supports building a VFS from a declarative description
(LLVM's VFS overlay YAML format), enabling external tools to describe a logical
tree that maps onto one or more physical locations—useful for build systems and
reproducible compilation environments.
7.3.5.5. Testability and determinism via InMemoryFileSystem
InMemoryFileSystem offers a fully synthetic implementation for unit tests and
tooling pipelines; it can host files from buffers and participate in overlays,
letting clients run "as if" a filesystem existed without touching disk.
7.3.5.6. Error-first, non-exception API surface
Operations typically return ErrorOr<T> / std::error_code style results rather
than exceptions, matching LLVM's broader style and making failure paths explicit
at call sites—useful for tooling where "filesystem may be virtual / partial /
inconsistent" is expected.
7.3.5.7. Tradeoffs
- Excellent for toolchains, less "language-level"
- treating the filesystem as
the substitution point (rather than paths) is extremely effective for
compilers and build tooling, but it leaves path correctness/safety largely as
a convention rather than something enforced by a first-class
Pathtype
7.3.5.8. Limitations
- Stringly-typed paths
- there is no dedicated path value with enforced invariants; callers can still accidentally mix "text" and "path name", and path parsing/normalisation lives outside the VFS abstraction.
- Not a capability model
- while passing a
FileSystemobject is context injection , it does not (by itself) guarantee authority restriction the way directory-handle capability systems do; whether an FS is "sandboxed" depends on the specific implementation provided.
7.3.6. Linux VFS
https://docs.kernel.org/filesystems/vfs.html
https://www.kernel.org/doc/html/latest/filesystems/path-lookup.html
https://www.infradead.org/~mchehab/kernel_docs/filesystems/path-walking.html
Linux's Virtual Filesystem (VFS) is the kernel's common interface layer that lets many different filesystem implementations share one API and one global namespace. Conceptually, it cleanly separates pathname lookup (turning a name into a located object in a mounted namespace) from operations on an opened object (reading/writing via a handle).
7.3.6.1. Pathname lookup yields kernel objects, not "a better path string"
Inside the kernel, walking a pathname produces a resolved location in the
mounted namespace, represented as a (dentry, vfsmount) pair ("a path"). The
lookup process may need to instantiate dentries along the walk and load the
corresponding inodes as needed.
7.3.6.2. Dentry/inode caches make "resolution" explicitly stateful
Dentries are RAM-only objects used to cache namespace structure; the dentry cache is intentionally an incomplete, evictable view of the full namespace. Consequently, name resolution is inherently effectual and state-dependent even before you "open" anything.
7.3.6.3. Handle-centric operations after open
Once a file is opened, subsequent operations are performed through the handle
(file descriptor → kernel struct file) and dispatch through per-object
operation tables. This is the VFS's core split: name lookup is one phase; I/O
and metadata operations are another.
7.3.6.4. Mounting defines the namespace context
The meaning of a pathname is defined relative to the process's view of the mount tree: lookup proceeds through mountpoints as it walks components, so "the same string" can denote different resources in different mount namespaces.
7.3.6.5. Tradeoffs
- Performance vs complexity
- aggressive caching (dcache/icache) makes lookups fast, but increases conceptual complexity and makes "resolution" visibly dependent on kernel state.
- Generic interface vs uniform semantics
- VFS unifies how operations are invoked, but filesystems can still differ in edge semantics (atomicity, metadata fidelity, special files), so higher layers must be cautious about assuming POSIX-uniform behaviour.
7.3.6.6. Limitations
- Not a user-facing VFS API
- this is primarily an internal kernel architecture; user code sees syscalls, not the VFS object model directly.
- Stringly pathnames at the boundary
- pathnames are still passed as strings to syscalls; the kernel's strong separation doesn't automatically give the language a first-class path value type.
7.3.7. Plan 9 and the 9P protocol
https://9p.io/sys/doc/9.html
https://9fans.github.io/plan9port/man/man9/intro.html
https://9p.io/plan9/about.html
https://9p.io/magic/man2html/1/bind
https://css.csail.mit.edu/6.824/2014/papers/plan9.pdf
Plan 9 pushes "filesystem as universal interface" to the extreme: most resources are presented as files, and each process has its own mutable namespace assembled by mounting and binding servers. 9P is the uniform protocol used to traverse names (paths) and obtain authoritative references (fids) to resources hosted by local or remote filesystem servers.
7.3.7.1. Per-process namespaces as an explicit "filesystem context"
A Plan 9 namespace is not a global OS singleton: processes inherit a namespace, can modify it (mount/bind), and those changes affect only that process (and its children, depending on how they're spawned). This makes the "context in which paths are interpreted" concrete and composable.
7.3.7.2. Resolution is a protocol-level operation producing a handle
In 9P, you don't "resolve a path into a better path string"; you walk from an
existing handle (fid) through a list of path segments, producing a new fid bound
to the target object (subject to permissions). That fid is the authoritative
reference you later open, read, write, and clunk .
7.3.7.3. "Everything is a file" becomes "many services are fileservers"
Because access is standardized through 9P operations, diverse resources can be surfaced as file trees served by user-space components (editors, networking stacks, IPC services, synthetic /proc-like views, etc.), and then mounted into a process's namespace as needed.
7.3.7.4. Union directories and bind semantics make overlaying a first-class primitive
bind can replace, prepend, or append trees, creating union directories where lookups search a list of members and creation semantics are controlled by flags (notably -c). This provides a principled "overlay FS" story at the namespace layer, rather than as a special filesystem feature.
7.3.7.5. Capability flavour via "start handle + relative walk"
A fid acts like a capability: once you have it, operations are scoped to that
object; and crucially, walking is relative to a fid you already possess (e.g.,
the fid returned by attach). This lines up closely with modern handle-relative
APIs (openat , WASI) where authority is mediated by directory handles rather
than ambient absolute paths.
7.3.7.6. Tradeoffs
- Powerful composition vs mental overhead
- per-process, user-assembled namespaces enable elegant redirection/overlay/sandbox patterns, but they shift complexity from "global system configuration" into application/runtime composition.
- Uniform protocol vs performance/semantics variance
- a single file protocol makes extension easy, but remote/fileserver-backed resources can have very different latency and behaviour from local disk, and clients must expect resolution and metadata to be more "distributed-system-like."
7.3.7.7. Limitations
- Not a language-level path model
- Plan 9's core innovation is at the OS/protocol layer; it doesn't, by itself, provide a first-class path value type with lexical operations the way modern language stdlibs do.
- Portability/interoperability friction
- outside Plan 9 (or plan9port/9P clients), the model doesn't map 1:1 onto mainstream OS APIs without adapters, and semantics can surprise POSIX-native tooling.
7.3.8. WASI
https://wa.dev/wasi:filesystem
https://github.com/WebAssembly/wasi-filesystem
https://blog.sunfishcode.online/capabilities-and-filesystems
WASI's filesystem design is explicitly capability-oriented: a module is given pre-opened directory descriptors (capabilities), and all path-based operations are performed relative to one of those descriptors, rather than via an ambient global filesystem namespace. This makes sandboxing the default shape of the API, and pushes "what can this code access?" into explicit values rather than implicit process state.
7.3.8.1. Pre-opened directories as the root of authority
Rather than allowing a module to name arbitrary absolute paths, WASI starts execution with a set of directory handles supplied by the host/runtime ("pre-opens"). These handles define the only namespaces the module can traverse via path lookup, and they serve as the natural unit for granting or withholding filesystem authority.
7.3.8.2. Descriptor-oriented *at API surface
Filesystem operations are largely phrased as methods on a descriptor (file or directory), with " -at" operations accepting a relative path—mirroring POSIX openat / renameat / unlinkat patterns. This keeps "resolution" (name lookup relative to a base) explicit in the call shape and makes it composable for sandboxing.
7.3.8.3. Path resolution rules enforce sandbox boundaries
Path-taking operations are designed to prevent escaping the granted namespace: absolute/rooted paths are not permitted in key operations (and some operations explicitly reject symlink contents that are absolute/rooted). This shifts a large class of "path traversal" risks from application-level string handling into the runtime's enforced resolution logic.
7.3.8.4. Rights/permissions model evolution
Earlier WASI iterations used a fine-grained "rights" bitmask model; the (archived) wasi-filesystem interface notes moving away from explicit rights and toward a simpler "mode/flags" approach, with access control largely enforced by the host and by the descriptor's mutability constraints. This is a notable trade: simpler, more portable APIs, but less standardized, inspectable authority at the interface level.
7.3.8.5. Tradeoffs
- Security-first ergonomics
- requiring a base directory descriptor (and rejecting absolute paths) improves auditability and sandboxing, but increases friction when porting "ambient POSIX" code that assumes a process-wide current directory and global namespace.
- Authority is clear, but coarse at the interface boundary
- shifting detailed permissioning into the host/runtime simplifies the spec and implementations, but reduces the degree to which a module can introspect or statically reason about precise granted capabilities beyond "which pre-opens do I have, and are they mutable?".
7.3.8.6. Limitations
- Host-dependent semantics
- the API is primarily intended for "host filesystem" access and does not try to fully normalize away OS differences; some behaviors are explicitly host/filesystem dependent.
- String paths rather than structured path values
- path parameters are WIT string s (not a first-class structured path type), so the interface doesn't itself provide a path-manipulation model—only resolution and operations.