Dagan Henderson

Longhorn , Rancher’s distributed block storage for Kubernetes, has a remote-code execution vulnerability (CVE-2021-36779 ) in versions in the v1.2.x branch prior to v1.2.3 and in all versions prior to v1.1.3. An unauthenticated attacker with access to the Longhorn network services can:

execute, as root, any binary on the host machine or in the Longhorn container
copy arbitrary data, including executable scripts or binaries, to the host machine or Longhorn container
enumerate and read all files on the host machine, such as private keys and stored credentials

Because Longhorn is deployed as a DaemonSet, it is present on every node in the cluster, enabling an attacker to gain root access to every node.

The Discovery

A few months ago, I was working on getting Longhorn deployed in an Istio service mesh. I’ve done quite a bit of work adapting various application deployments to Istio, but I was a little surprised to see issues with Longhorn’s instance-manager pods. Istio’s strict mTLS is a frequent source of networking trouble were not associated with a Service and the PodSpecs did not include any container ports. The pod’s commands, however, included the flags --listen 0.0.0.0:8500. That explained the issue with Istio. Because the PodSpec did not specify the container port, the Istio proxy was not handling incoming traffic properly. The next step was to understand what protocol Longhorn was using so I could resolve the networking issue.

Browsing the instance manager repository, I found where a gRPC server was started. I was excited to see gRPC because that meant the API would be documented in a .proto file (in this case, pkg/rpc/rpc.proto ) that would be easy to read. Looking at the file, though, I was immediately very, very concerned. Take a look at the highlight lines.

 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23


service ProcessManagerService {
    rpc ProcessCreate(ProcessCreateRequest) returns (ProcessResponse) {}
    rpc ProcessDelete(ProcessDeleteRequest) returns (ProcessResponse) {}
    rpc ProcessGet(ProcessGetRequest) returns (ProcessResponse) {}
    rpc ProcessList(ProcessListRequest) returns (ProcessListResponse) {}
    rpc ProcessLog(LogRequest) returns (stream LogResponse) {}
    rpc ProcessWatch(google.protobuf.Empty) returns (stream ProcessResponse) {}
    rpc ProcessReplace(ProcessReplaceRequest) returns (ProcessResponse) {}

	rpc VersionGet(google.protobuf.Empty) returns(VersionResponse);
}

message ProcessSpec {
    string name = 1;
    string binary = 2;
    repeated string args = 3;
    int32 port_count = 4;
    repeated string port_args = 5;
}

Looking at the .proto, it seemed that Longhorn was exposing an API to execute arbitrary binaries, and looking at the way the gRPC server was stood up , there was apparently no authentication mechanism. If true, that would be bad. Worse, though, was that the image ran as root and used Ubuntu as its base image, so there were plenty of tools for an attacker to work with.

The Investigation

I wanted to verify my finding and decided that it would be good to bring in another engineer, so I reached out to my colleague Will Kline. After showing Will what I found, I suggested we could use gRPC to write a quick API client and test my theory. It didn’t take long to slap together a client that was hard coded to execute a single command to echo some text into a file. Will noted that the image was mounting the host’s /proc fileystem and suggested we target the host’s filesystem using /proc/1/root, which would be the root filesystem of the host’s PID 1 process. It worked. With Will’s help, it had taken less than an hour to go from investigating a networking issue with Istio to a functioning exploit that could run any binary on the target system. The gRPC server was not exposed outside the cluster, so an attacker would need to be able to either deploy a malicious workload to the cluster or arrange for traffic to be proxied, which is the only thing preventing this exploit from having a CVSS v3 of 10.0.

The Disclosure

After taking steps internally to secure our own deployments of Longhorn, we reached out to the folks at Rancher to disclose the vulnerability. During one of our discussions with the lead developers, we discussed the other APIs used by Longhorn. At one point, we questioned whether the service used to synchronize data replicas could be used by an attacker to gain access to data stored in Longhorn volumes, and that’s how CVE-2021-36780 was found.

CVE-2021-36779: Remote Code Execution in Longhorn

The Discovery

The Investigation

The Disclosure