Separate (some) user applications from the operating system

Android, iOS, macOS, and Chrome OS all provide a level of separation between user-installed applications and the operating system. The operating system may provide certain applications to the user, but those applications are not installed by the user and generally cannot be uninstalled or replaced. User-controllable applications are separated from the rest of the system, and generally are sandboxed to prevent them from doing damage if they turn out to be malicious or become compromised.

Debian (and by extension, Kicksecure) doesn’t do this so well. All packages in the repositories are essentially operating system components, with full permissions to do anything the user account the application is launched as allows. Core components like udev and systemd are installed directly alongside productivity apps like image viewers and web browsers. This is a bit messy from the standpoint of system organization, but it also poses a significant security hazard for obvious reasons (XKCD 1200 sums up the issue nicely).

Users should be able to install arbitrary applications as OS components and give them elevated permissions, as they’ve always been able to do on Kicksecure and Whonix. But users should also be able to wall off applications into a sandbox of sorts if they don’t trust those applications or may have to process untrusted data with those applications.

Flatpak and Snap are relatively well-known attempts at bringing sandboxing and application isolation to Linux. Both of these are not exactly awesome solutions, because:

  • Sandboxing controls under Linux can be somewhat flimsy without putting a great deal of work into doing things like whitelisting syscalls and the like. In some instances, Linux’s sandboxing features actually worsen security (user namespaces in particular allow access to kernel facilities that aren’t usually available to unprivileged users, exposing additional attack surface; this has been the source of numerous CVEs, but user namespaces are also required by applications that do sandboxing internally, like Chromium).
  • Users can loosen sandboxing very easily with Flatpak, and without too much effort with Snap, defeating the purpose of the sandbox.
  • Flatpak and Snap sandboxing oftentimes causes application malfunctions due to restricting resources that applications legitimately need access to.
  • Applications from the Flathub repository and Snap Store can be uploaded by arbitrary users. This is in contrast to the Debian archive, which requires some level of skill and trust for a packager to gain upload access to it, and where policies help to ensure better packaging quality in general. As a result, Flatpaks and Snaps are oftentimes more broken than apt packages, and it is easy to upload malware to the repositories.
    • Malware actually has been uploaded to the Snap Store and has caused severe damage in the past, with at least one person losing the equivalent of $490,000 after plugging in cryptocurrency recovery data into a malicious Exodus Wallet snap. This particular incident underscores the fact that sandboxing apps isn’t really enough, apps can still be malicious without having to break out of a sandbox at all. This same kind of attack happened on the Snap Store multiple times. This is why getting apps from a trusted repository is a good idea.

What we really want is some sort of virtual machine that is integrated with the host enough to allow apps in the VM to feel native or close to native, but sufficiently walled off so that applications cannot do damage to the host system or escape the VM. This would allow running “user applications” in a very tight yet highly compatible sandbox, and would allow pulling those applications from trusted repositories like the Debian repos. It would also allow users to install applications without having to boot into a sysmaint session.

Chrome OS has something similar to this with Crostini, and Windows has something similar to this with WSLg. Something with a vaguely similar user experience would likely be acceptable, especially if an “app store” frontend of some sort was available so the user didn’t have to use a terminal to install sandboxed applications.

3 Likes

Created a wiki page with some ideas: vm-app-manager - Virtualization-powered application sandbox

1 Like

Flatpak is also messy for other reasons. Unfortunately, not high quality packaging with the same standards and quality control as Debian. See:
Flatpak


Some potential overlap:

Perhaps sandbox-app-launcher is promising?

It wouldn’t fully separate core system packages from user applications. Debian packaging maintainer scripts would still be trusted. During package installation, malicious or vulnerable Debian packaging maintainer could compromise the system. There are no history precedents in packages.debian.org as far I am aware, but still something worth mitigating.

Perhaps sandbox-app-launcher, if worthwhile, could install packages inside a per-application system chroot?


The Limitations seem a deal-breaker.

Check spectrum design:

https://spectrum-os.org/design.html

1 Like

Yeah… containerization is so flimsy from what I’ve seen though that it seems like a dealbreaker to me. The limitations in nested virtualization scenarios aren’t really that bad since the user is already using virtual machines, they can use multiple VMs for isolation.

Being unable to use virtualized apps and VirtualBox at the same time is a real issue, and one that is very hard to avoid. Maybe it would be possible to allow the applications containers to function as either containers or VMs, so that one could use containerization when necessary, and virtualization when possible.

Incus considers container escape to be a security vulnerability: Security Overview · lxc/incus · GitHub It’s an LXD fork (developed by the original LXD developers), and might be more workable (maybe even more secure?) than bwrap. I believe it’s present in Debian Trixie.

I dont see the need to use virtualbox if we have KVM support. Virtualbox on gnu/linux is bad idea even more on debian.

So i see that this can be avoided by not having virtualbox consideration.

I think better to avoid the possibility to downgrade the model and allow application/s to use containers instead of being virtualized, because containers using the same system kernel will lead to bad results IMO.

We can’t really not consider VirtualBox - it’s the primary supported platform for Kicksecure and Whonix, and the automated installer will install VirtualBox and a Kicksecure appliance by default. It’s the Kicksecure platform. This is probably part of why the restrictions are a dealbreaker because of nested virtualization concerns (it will technically work but it will be painfully slow in many circumstances).

Containers using the same system kernel won’t be a problem if the same OS is being used both on the host and in the container, which we can do. The security concerns of containerization are still there, but if we use a mature containerization platform like Incus rather than trying to roll our own (which is what I think we were doing with sandbox-app-launcher in the past), that might not be so bad.

1 Like

Would “do it similar to how Android is doing it” be an option?

  • Separate Linux user account name.
  • Separate folder.
  • Mandatory access control.
  • Secomp (optional, as feasible).

Though, Android might be starting to use Android Virtualization Framework (AVF) for some apps.

My first thought looking at this is that user account isolation sounds neat but will probably be difficult to implement. Lots of file ownership changes would be required in order to allow the user to get files into and out of applications. AppArmor doesn’t have a sandboxing feature like SELinux appears to, though it does allow enforcing MAC. Seccomp is unsafe, Android can get away with it because Google can enforce rules about how Android apps are written and we can’t do that with Linux apps. It also sounds like Android is using some sort of chroot variant possibly (or some fancy feature of SELinux) to limit host file visibility, which would further complicate things.

I guess it would be better than nothing though. I’m not sure if it would be easier to implement than using a containerization engine, maybe it would be.

(I don’t like sandboxing approaches in general especially without seccomp, since any LPE vuln in the kernel will allow a sandbox escape, and there have been many LPE vulns in the kernel as I understand it. Virtualization would make that much, much harder.)

2 Likes

These were periodic time choices, meaning in 2013~ there wasnt much of choices for new project growing up, coming 2025 this is can be considered from the past.

Switching to VM per app, this idea can even merge whonix into kicksecure, how? because what is whonix doing is taking OS in VM connecting it through another VM, what can be done on VM per App is that X app in VM can use Y anonymization app/s in VM or even GW in VM.

To have these genius futuristic ideas coming to true without having virtualbox on the other hand = Be it.

Moderation notice:

One possible argument in favor of the VM-based approach is that we could run “unsafe” apps in sysmaint sessions safely. We went to a great deal of effort to prevent people from launching Firefox in a sysmaint session because of the dangers it poses, but launching a virtual machine sandbox with Firefox in it would be much safer than standard or sandboxed Firefox. The inability to safely use a web browser in a sysmaint session is causing some frustration, and while we did see that coming and decided it was worth it, being able to remove that frustration would be nice.

1 Like

Since Android is going for a virtualization based approach for some apps and since that’s more secure we might need to do the same.

Perhaps if running on a host operating system, we’ll use per application virtualization.

In case of nested virtualization (if we run already inside a VM), we’ll investigate that more. Provide a user configurable setting in that case perhaps. Or fallback to another solution (run without virtualization) because we’ll suggest to use multiple VMs.

We’ll need a separate ticket to investigate the security impact of nested virtualization. We’ll weigh it against running applications inside a VM without nesting.

That is, if even possible, such as slow performance making this infeasible.

1 Like

Could you elaborate a bit on that please?

I guess…

Google dictates how apps must be built, packaged, and behave if they want to be distributed through the Play Store. Google can reject apps that don’t comply with sandboxing or violate permissions. Google Android developers have to use SDK and libraries dictated by Google.

Why we cannot do that on general purpose Linux distributions?

Right, and beyond that the Android operating system is designed in such a way where you fundamentally can’t just “do whatever you want”. The way in which the typical Linux desktop works, if someone needs root access and doesn’t want to prompt for it, they can just install a polkit configuration snippet or similar as part of the installation process, and get that access. With Android, not only can you not do that by policy, you can’t do that by design. Furthermore, we have to work with an already-existing app ecosystem with Kicksecure, whereas Android was free to make their own ecosystem from scratch. They don’t need to retain compatibility with anything if they don’t want to, whereas we do have to do that.

1 Like
1 Like

Some thoughts I had on this recently:

  • KVM and VirtualBox both support nested virt, it runs acceptably well on more recent CPUs and can limp along on older ones, and more recent CPUs are necessary to mitigate some side-channel attacks anyway.
  • Xen will probably gain nested virt support soon, and it may already support it to some degree in the form of (less secure) PV virtualization, i.e. it might be possible to run PV guests within a PVH or HVM guest.
  • QEMU’s attack surface is probably rather high, Cloud Hypervisor’s is probably lower but isn’t in Debian, but there’s also kvmtool which I believe is already in Debian (edit: scratch that, I know it was in Ubuntu 16.04 but I guess it’s not in Debian any longer). It’s smaller than QEMU and a bit similar to Cloud Hypervisor, though it has the disadvantage of being written in C. Unsure if it’s intended for security-critical use cases, and it seems to be rather inactive upstream at the moment, but it might be safer than QEMU… not sure.
2 Likes

Only i see we can examine is KVM and Xen, others are messy to be used outside what it meant for.

KVM: Smoothly working with any Linux version, which resolve the headache of vbox.

Xen: Less attack surface than KVM but its an outsider package for linux which need its own changes and implementation with the kernel.

The focus should be narrowed to these two hypervisors, which implementation can be done with, is a good one.

I think there’s no question that if we ever implement a virtualization-based sandbox-app-launcher, we should use KVM or Xen as a backend. VirtualBox as a backend would probably be a bit of a mess. kvmtool is another KVM frontend, as is Cloud Hypervisor.

1 Like

Discussed some aspects of this with Patrick more, I’m not sure if we’re getting closer to a concrete implementation plan or not, but here’s some rough notes about what we were thinking:

Right now we have two privilege levels, user and sysmaint. sysmaint has the power most administrative users on other distros have; it’s “root with light safeguards to make it harder to do something foolish on accident”. user is more like what non-administrative users on Windows are; they can run arbitrary applications, work with files, use the network, etc., but they can’t mess with the guts of the system. This provides some additional security since it’s harder to get a foothold in the root account, but ultimately it doesn’t do much to fight malware that wants to steal or encrypt the user’s data, except for making it more likely that a user will be mindful about what they install.

Our ultimate goal, however we implement it, is to add a third privilege level to this model. Let’s call it untrusted-user (bad name, but it gets the point across for now).

  • The standard user privilege level will retain all of the power it currently has. It is also where all of the user’s trusted data will be stored.
  • The untrusted-user privilege level is where more dangerous applications (such as web browsers) should run. This privilege level should be resistant to attacker-incited persistent compromise, while simultaneously expecting temporary compromise. That is, if an attacker manages to compromise a legitimate application running as untrusted-user, that compromise must be able to “just go away” somehow. While the attacker has a foothold in untrusted-user, our implementation must contain that compromise if at all possible so that it cannot gain access to user data.
    • This implies that untrusted-user will be at least semi-ephemeral; changes made by malware need to be able to be erased reliably in some fashion.
  • It is likely that the untrusted-user environment will suffer from user-incited persistent compromise (i.e. someone intentionally installed malware into the environment). It therefore must be possible to get rid of malware of this kind in some manner.
  • Users may work with data they want to store long-term when using untrusted-user, so they must be able to save that data persistently. Some of that data may be trusted, in which case there needs to be a way to move it into the user privilege level (users will obviously have to be very careful about what data they move), while some of that data may be untrusted and needs to stay in the untrusted-user privilege level. Users will also need to be able to move files from the user privilege level to the untrusted-user level.
  • Users will be running resource-intensive apps (most notably web browsers) in the untrusted-user privilege level. They will also be running these apps within potentially resource-constrained virtual machines. Thus, performance concerns have to be taken into account. Any solution based on virtualization will have to take into account the fact that nested virtualization may be disabled or may not be available at all (such as on current versions of Qubes OS).
  • Users will be running apps that make use of Linux’s native sandboxing capabilities. The implementation of untrusted-user must not break or weaken these sandboxes if at all possible.
  • Applications running as untrusted-user are going to have to do IPC with things that run as user (most notably Wayland, Pipewire, and terminal emulator software, but probably also others). This provides substantial attack surface; Wayland is not a simple protocol, and labwc is not written in a memory-safe language, so I would not be surprised if a motivated attacker could attack the compositor to escape the sandbox.
  • We do not know what applications the user will run in the sandbox. The sandbox should work transparently with as many existing applications as possible.
  • Users will have legitimate reasons to run things in the user privilege level for compatibility or performance reasons. Not everything can be pushed into a container or VM and just work. However, a subset of users will only want to use the user privilege level for managing application that run as untrusted-user, or for very basic tasks. For those users who can afford to lock down user, there should be an option that uses a project like apparmor.d to lock down the user privilege level. This might be worth while to enable by default and make opt-out.

There are two main sandboxing models that I can see, per-application sandboxing and per-usecase sandboxing. Firejail, Bubblewrap, Flatpak, and sandbox-app-launcher are examples of the former kind. I personally do not believe this is the best way of doing things, for a few reasons:

  • Per-application sandboxing generally involves figuring out everything a particular application needs, allowlisting it, and then denying everything else. This means that a special profile has to be written for every application, and that lots of fiddling is needed to ensure good application compatibility. sandbox-app-launcher doesn’t seem to have had this problem, but Firejail and Bubblejail certainly do.
  • Per-application sandboxing makes it confusing, difficult, or impossible to run the same app in two different “trust domains”. It causes similar implications when trying to run multiple apps in the same “trust domain”.
  • Because each application is isolated independently of others, one cannot easily sandbox utilities that are primarily used as parts of larger systems. For instance, it’s hard to imagine a useful generic sandbox for netcat. Per-usecase sandboxing allows one to sandbox these lower-level utilities meaningfully (for instance, maybe my “software-dev” environment should only allow netcat to reach out to termbin.com, while my “document-processing” environment should allow it to reach out to a local network printer for… some reason… this is a bad example but you get my point).
  • Because each application is isolated independently of others, IPC becomes a concern. Should a sandboxed Tor Browser be able to ask my file manager to open a directory and show me the contents? On the surface, the answer is “yes”, because I might want to use its “Open containing folder” feature after downloading something. But what if the browser is compromised, and it’s able to use the ability to open my file manager by itself as part of a social engineering attack, trying to convince me to upload sensitive data somewhere? It shouldn’t be able to do that. With per-usecase sandboxing, I can say that Tor Browser should be allowed to communicate with anything within its sandbox, for any reason, but nothing outside of its sandbox. If the browser then decides to open a file manager window out of nowhere, it will be recognizably not my normal file manager, and it won’t be able to show me any sensitive files, so the attack’s likelihood of success goes down.
  • Sandboxing of server-like “applications” becomes harder, especially if they depend on and integrate with things like systemd.

Per-usecase sandboxing is basically what containers and VMs give us. VMs are better from a security standpoint in a lot of ways, but due to resource constraints and compatibility issues, we can’t rely on them alone like I was suggesting previously. Containers are somewhat scary because they run directly on the host kernel, meaning that any host kernel bug in code accessible to the container can be exploited, potentially allowing an attacker to obtain kernel-level privileges. That being said, they’re better than nothing, and they allow working around many of the limitations of virtual machines (for instance, 3d acceleration can work inside containers).

There are a lot of existing containerization systems out there; Docker, LXC, LXD, Incus, systemd-nspawn, libvirt-lxc, etc. Which one of these would be good to build on top of, I’m not quite sure yet, but I initially think either systemd-nspawn or libvirt-lxc would be a good option. One nice thing about libvirt-lxc is that it allows defining a container in much the same way one would define a virtual machine, and libvirt provides both container and virtual machine functionality. This might permit implementing both VM-based sandboxing and container-based sandboxing in the same application. Something similar could be done with systemd-nspawn and systemd-vmspawn possibly, but unfortunately systemd-vmspawn is not present in Debian Trixie, so this may not be the best option.

In my mind, an ideal sandboxing solution would look something like this:

  • There’s a privileged daemon (let’s call it sandboxd), which listens for requests to create and manage sandboxes. sandboxd creates sandboxes by using mmdebstrap to create a Debian rootfs with proper UIDs and GIDs so that an unprivileged sandbox can be booted. This daemon runs as root so that it can make the needed ownership changes after bootstrapping a new sandbox.
  • There’s a user-accessible application (call it sandboxctl) that talks to the daemon in order to do things like create, delete, rename, query information about, start, and stop sandboxes. It can also request files to be moved between the user’s home folder and the sandbox’s home folder, or can request a directory to be transparently passed through.
  • A number of validating proxies for basic, unavoidable IPC are provided with the system. These proxies do things like virtualize Wayland, Pipewire, and console access, preventing malicious interactions with parts of the system that run under the user privilege level.
  • A number of basic permissions exist (“allow network”, “allow GUI”, “allow audio”, “allow mic”, “allow 3d accel”, etc.). These permissions can be toggled on or off at the trust domain level. Wherever possible, validating proxies are used to enable these permissions rather than just passing through bits of the host system to the container. In some situation though, host passthrough will likely be unavoidable, for instance for passing through something like /dev/dri/renderD128 for 3d acceleration.
  • A helper application using seccomp-bpf will be provided that will deny the ability to create user namespaces to most applications in some way (handwaving here, haven’t thought this through fully). Only specific applications that need user namespaces such as web browsers will be run without this application wrapping them, thus allowing us to have the lessened kernel attack surface of “no user namespaces”, and the better in-browser security that comes with user namespaces.
  • Sandboxes can be booted with an optional RAM-based overlay within the sandbox to make them ephemeral. Only specific folders within the sandbox’s home folder such as Documents would be exempted (bind mounting would be needed for this). This would allow a user to make a sandbox “forget” a compromise, even one that used files like ~/.bashrc for persistence, by simply rebooting the sandbox. For software updates and installation, the sandbox could be booted in a fully persistent mode.

This is the basic architecture in my mind. This would meet all of the criteria above AFAICT.

The hardest part of this is probably going to be the validating proxies. What might be doable instead is to run a real Wayland compositor and audio server within the sandbox, then use simpler protocols like VNC and audio streaming to get video and audio out of the container. A terminal emulator could then be run within the sandbox rather than using a user-privileged terminal emulator to interface with the sandbox. The performance impact of this might be substantial, and it could make things like the clipboard harder to use, but perhaps not impossible. Research would have to be done into this.

Everything after this is research I did into a bunch of existing sandboxing mechanisms and why I think they probably aren’t suitable for what we want to do.

  • VirtualBox and KVM are both very resource intensive. They require lots of disk space, lots of memory, lots of CPU power, and destroy the ability to use graphics hardware acceleration (which is necessary to do things like watch videos at reasonable resolutions). They require hardware virtualization features, making them unsuitable for use within Qubes or environments where nested virtualization isn’t available or is painfully slow.
  • Flatpak uses a combination of namespaces, cgroups, and seccomp to sandbox applications. While good in theory, in practice this results in a lot of problems:
    • Many Flatpaks have very loose permissions, and those that don’t oftentimes don’t work right.
    • seccomp is used to deny access to features that may increase kernel attack surface, like user namespaces. This breaks Chrome’s sandbox among other things, making it more dangerous to use a Flatpak-packaged browser than an apt-packaged one.
    • Sandboxed applications connect directly to the Wayland compositor, which is unsafe as described above.
    • The way in which Flatpaks bundle dependencies means that oftentimes Flatpaks can contain insecure dependencies, resulting in security problems. https://flatkill.org/ goes into a lot of detail on that.
  • Snap suffers from similar issues as Flatpak, though it may not be as bad from a dependency standpoint if it uses packages from the Ubuntu archive. It may also handle nested sandboxing better. It’s a semi-closed-source system dependent upon Canonical though, so it may be better avoided.
  • Bubblewrap can be used on its own, without Flatpak. Depending on what exactly is being done, this can be a viable sandboxing mechanism, it might be usable instead of systemd-nspawn or libvirt-lxc. I’m not sure if it has advantages beyond these mechanisms yet, it might complicate attempts to make VM-based sandboxing work.
  • I took a look at Landlock. It looks like a rather interesting way to do some of the things AppArmor does, but without needing privileges. It can be used to do things like make an application that says “I need to be able to read from this dir, read and write this other dir, and read, write, and execute this third dir, and that’s it.” Then the kernel will keep it from doing anything other than the things it locked itself into doing.
    • This isn’t really what we’re looking for I would argue. It might be interesting for some specific usecases, but… we already have AppArmor for this sort of thing. This is a variant on mandatory access control, which doesn’t provide the compatibility we’re looking for.
  • AppArmor is something we’re already familiar with, as we actively use it to do things like sandbox Tor Browser.
    • This requires a custom policy to be written for each sandboxed application, which is a pain, and it also doesn’t do much to work with applications that don’t work if you deny them resources that are potentially dangerous. It also doesn’t allow isolating data into separate untrusted-user environments.
  • Firejail is designed to sandbox individual applications using individually written profiles. It also allows the user to create custom profiles for applications that don’t have profiles written for them already. It sandboxes the applications that already exist on the root filesystem.
    • It’s SUID-root, which is problematic since we’re trying to get rid of such applications.
    • While simple to use in theory, it seems to require the user to know that they’re doing a bit more than I personally would prefer. Rather than saying “this set of apps has its own environment”, it says “here are the parts of the user environment this particular app can access.” This is possibly dangerous since it blurs the lines between the privilege levels we want to establish. For instance, a browser should naturally be allowed to save things in Downloads, but what if you have sensitive company data in your Downloads folder, and your browser gets compromised and uploads it?
    • There has been at least one TOCTOU vuln in Firejail which allowed an attacker to escalate their privileges to root. This is arguably proof that Firejail as an SUID application is a less-than-great idea. Rigged Race Against Firejail for Local Root is a good example. There have been other vulnerabilities in Firejail that make this look dangerous.

This is a bit long, but hopefully it will be useful.

1 Like