Case Study: Why You Shouldn’t Trust NTDLL from Kernel Image Load Callbacks

By Kasif Dekel and Roy Ronen

Introduction

In this post, we disclose several vulnerabilities in a security product called “CryptoPro CSP”, discovered during an interoperability investigation. These vulnerabilities were assigned the following CVEs: CVE-2020-9361/CVE-2020-9331.

CryptoPro is widely used in the industrial sector in Russia as it is required by FAPSI, a federal agency responsible for communication and information in Russia. Most banks require it as well as many government services and other companies. CryptoPro also won a prize “For Strengthening Russia’s Security”.

Mishandling user input from a kernel mode driver can lead to severe vulnerabilities and potentially allow attackers to compromise the whole machine. Kernel mode driver writers are typically aware of correct user input handling in various ways such as IOCTL handling and parsing user mode data from untrustworthy sources – for example, image binary data or dynamically allocated data like PEB or TEB. In this article, we’ll discuss a trickier scenario for this problem.

While exploiting kernel-mode vulnerabilities is not new, in this case study it is caused by an unexpected input vector. We’ve seen other products that suffer from similar vulnerabilities. These are often hard to expose, so we hope this writeup will be beneficial for developers and security practitioners.

Technical Details

CryptoPro CSP is a cryptographic software package which implements the Russian cryptographic algorithms developed in accordance with the Microsoft – Cryptographic Service Provider (CSP) interface. To achieve near complete integration with Windows applications, it injects itself into appropriate processes to enforce the use of its algorithms.

How does the injection mechanism work? As follows (in brief):

  1. The driver registers itself to process creation and image load notification callbacks using PsSetCreateProcessNotifyRoutine and PsSetLoadImageNotifyRoutine. Once a process is created, it maps a section view to it which contains a copy of its own DLL, used as a bootstrap to further load remaining DLLs in order to enable its appropriate hooks.
  2. When an image load callback is fired, the driver checks whether the loaded image is NTDLL. If so, it resolves pointers of several functions inside NTDLL and writes them to the newly mapped section and thus resolves needed function addresses used by the bootstrap DLL.
  3. The final step is to allocate and queue an APC (Asynchronous Procedure Call) via KeInitializeApc and KeInsertQueueApc respectively.
  4. This APC initiates the injection phase inside the process.

To learn more about APCs check out this article.

It’s pretty simple to understand but it is essential for further understanding in the article. Here’s a flow chart:

The issue resides in the way CryptoPro CSP are writing to/from user mode. The way to acquire pointers looks like this:

PUCHAR dest = proc_item->get_exported_function_ptr(proc_item->section_view_base_address,"DsprLdrLoadDll");
PUCHAR src = proc_item->get_exported_function_ptr(ntdll_base, "LdrLoadDll");
memcpy(dest, src,proc_item->pointer_size);

The function pointer of get_exported_function_ptr in proc_item depends on whether the target process is 32 or 64 bit. It basically looks like this:

You can see where this is going: a specially crafted image could affect the return value of this function to return basically anything. Perhaps the driver developers assumed that at the point where NTDLL is being loaded, no one could have tampered with the loaded module data inside the process. It is understandable why they came to this conclusion.

Even before nt!NtCreateUserProcess returns, we can see that the main binary and NTDLL images are already loaded:

Which means the bug in get_exported_function_ptr is actually not exploitable. Or is it?

The description for nt!PsSetLoadImageNotifyRoutine mentions that:

It might come as a surprise, but in reality, the callback gets invoked only when a thread gets scheduled. What that means is that it is possible to create a process and not resume its threads, tamper with the mapped section view and NTDLL, and make get_exported_function_ptr return a kernel mode pointer for memcpy to write to.

So, to summarize, our exploit code should be quite simple:

    1. Create a suspended process:

    1. Go through its virtual memory mappings and find the corresponding section base that was mapped by CryptoPro’s driver:

    1. Write target_value minus ntdll_base to the correct RVA offset in NTDLL, write target_ptr minus section_base to the correct RVA offset in the newly mapped section view:

    1. At this point, the only thing left is to resume the thread and there we have a write-what-where vulnerability exploited:

Note: attentive readers might have noticed that the first write to the process (the write to the section base + offset of the RVA of DsprLdrLoadDll) is done with ntwritevirtualmemory; this is because the whole mapped section marked as PAGE_EXECUTE_WRITECOPY (also weakening DEP), and WriteProcessMemory first tries to change the protection of the page using NtProtectVirtualMemory, which results in 0xC000004E (STATUS_SECTION_PROTECTION).

In Windows 10, things look a little bit different. During process creation notify callback you can find the following:

So, the flow chart in Windows 10 really looks like this:

This is due to a remark on the PsSetImageLoadNotifyRoutine’s callback documentation:

In reality, this is not true for the “fake” (as we mentioned earlier) notification for the EXE or NTDLL being loaded.

This makes things more complicated; let’s have a look at how the mapping is done:

Assuming that the base address isn’t predictable, this makes things tougher to exploit since it is mandatory to resume the thread to map the section to the target process. However, doing so also executes get_exported_function_ptr, which triggers the vulnerable code. Thus, predicting the address is required to exploit this bug on Windows 10.

At first, techniques that we’ve been aiming for were to beat the PRNG (pseudo random number generator) of nt!RtlRandomEx since the seed is zero, and do some “VM grooming” on the target process. Leaving a hole in 2^28 from 0x60000000 (the base address for the section) won’t be enough because the mechanism for choosing the random address isn’t considering the current VM state of the process. Instead, we tried to reserve almost the entire address space of the target process, leaving a hole for the section view to be mapped, have this target process inside a job, throttle its CPU and then race the notify callback from another thread. This had its own problems, although it might be feasible nevertheless.

Since the first image binary to be notified by the kernel callback routine is the main executable, and only then NTDLL, we thought: what if we could suspend the process right after the first notification callback is executed, resulting in the section view being mapped but NTDLL load not being notified yet? All we had to do afterwards is write to the appropriate offsets and resume the thread.

At first, we didn’t expect this would work since this would require stopping some kernel mode thread “in the middle” of its job. But to our surprise it did. Basically, all we needed to add to our exploit was this:

POC

In this example, we use the exploit to patch nt!_SEP_TOKEN_PRIVILEGES.Present/Enabled to elevate our privileges.

Summary

To summarize, since image load callbacks are only called once a thread has begun executing, at the time of NTDLL being loaded, the process memory can be in a dirty state, specifically NTDLL.  Here, the security issue resides in the lack of user mode input validation by the driver, probably because it is considered to be in a clean state.

When a new process is created, the driver’s callback for image load notifications maps a section view to the process’ address space, then the driver proceeds and calculates an address based on user-mode-controlled data, and then it copies another user-mode controlled data into it. An attacker who controls those pointers and data is able to have the driver write arbitrary data to an arbitrary location in the kernel’s address space.

This issue can be triggered by just spawning a new process, which does not require admin privileges.

Disclosure

We’d like to thank Stanislav and Maksim from CryptoPro for their quick response and cooperation in fixing this bug;

  • 30.01.2020: Bug reported to CryptoPro.
  • 30.01.2020: CryptoPro acknowledged the receipt of the bug reported.
  • 06.02.2020: CryptoPro issued a fix, validated first fix.
  • 11.02.2020: CryptoPro issued a second fix, validated second fix.
  • These vulnerabilities were assigned the following CVEs: CVE-2020-9361/CVE-2020-9331.