@david_chisnall I think "if I run something with execve it is very difficult for an argument to be parsed incorrectly afterwards" is a highly desirable feature for a system centered around command line argument parsing in a highly heterogenous environment. yes, you *can* build a consistent system without this property. but a system with execve will trend towards not having quoting bugs, and a system with CreateProcess will trend towards having them, all else being equal
I think you’re conflating a couple of things here. The big problem with CreateProcess is that they take a single string for all command-lime arguments, whereas execve takes a vector of arguments. This is the root cause of a load of quoting errors because there is no way of passing multiple arguments to a child process on Windows that doesn’t involve quoting them. And, because this API is used directly, everyone rolls their own quoting code to go from their internal vector-of-arguments representation to the child process’s one.
This has nothing to do with glob expansion, which happens later after you’ve split arguments. And on UNIX-like systems, doing it on the shell causes all sorts of weird behaviour. For example, on FreeBSD, I often do pkg info foo* to print info about packages that start with some string. If I forget to quote the last argument, this behaves differently depending on whether the current directory contains one or more files that have the prefix that I used. If they do, the shell expands them and pkg info returns nothing because I don’t have any installed packages that match those files. If they don’t, the shell passes the star to the program, which does glob expansion but against a namespace that is not the filesystem namespace. The pkg tool knows that this argument is a set of names of installed packages, not files in the current directory, but it can’t communicate that to the shell and so the shell does the wrong thing.
Similarly, on DOS the rename command took a load of source files and a destination file or pattern. You could do rename *.c *.txt and it would expand the first pattern, then do the replacement based on the two patterns. UNIX’s mv can’t do that and I deleted a bunch of files by accident when I started using Linux because it’s not obvious to a user what actually happens when you write mv *.c *.txt. There is a GNU (I think?) rename command and its syntax is far more baroque than the DOS one because it is fighting against the shell doing expansion without any knowledge of the argument structure.
@david_chisnall @whitequark but I think the answer for globbing and adjacent matters would lie in programs declaring the "types" of their arguments and shells understanding them? This would also help with tab completion, etc.
But in the end, I guess lowest common denominator tends to win, because innovation is always pushing against limits, and inertia is a big force...
@coder @david_chisnall I was about to point this out as a natural evolution of my other suggestion earlier, yes.
It is not possible to move all processing outside of the tool unless your model is to allow the tool to provide complex arbitrary code to the shell. For example, consider gcc or clang. Some of their flags depend on the target, so you need to parse some of the, and then do lookups against complex data structures that depend on targets. It’s sufficiently complex that writing it in declarative code is hard.
Some folks on the .NET team had a very nice solution to this for autocompletion, where a .EXE had a special section of .NET IL that PowerShell could load. If you used their declarative framework for argument parsing then it would generate this for you and it would make autocomplete work beautifully in PowrShell (not sure if this was ever released, I played with a prototype). I can imagine a lightweight WAsm interpreter being an interesting approach for doing this on *NIX.
The best argument for doing expansion in the shell is one that is sadly not realised in UNIX. If you do file expansion and opening in the shell, you can start processes with file descriptors instead of (or as well as) paths. Processes can then be started with access only to files that are either listed in a manifest or passed on the command line. Build a system like this, and you have a nice way of respecting the principle of least privilege.
@david_chisnall @coder since at the beginning you stated that shared libraries are available: I am completely fine with applications being loadable shared libraries that the shell interoperates with via a function call based interface; at a point where you have structured data on the shell/application interface you already have your application be conceptually a function call, so you might as well implement it using a function call interface (or multiple functions, as you suggest here)
@david_chisnall @coder I do also think that it is viable, if restrictive enough that it goes against unix sensibilities, to have a declarative interface that limits how shell/filesystem-specific argument types (filename, glob, hole, etc) are used; I think it is an interesting direction for design because it will make the overly complex and difficult to use interfaces like that of `gcc` more difficult to build and so will make the CLI more pleasant to use overall
That is a lot closer to how MULTICS worked. MULTICS shared libraries were a lot richer than the ones that UNIX ever added and were security boundaries much like a UNIX process.
You might have noticed that I’ve copied MULTICS a lot in CHERIoT RTOS.
@david_chisnall @coder I am completely unfamiliar with MULTICS. do you have any good introduction to it?
@whitequark @coder Not really. They published a load of things, but hanging out with Peter G. Neumann and listening to everything he said was the most useful for me.