Using WDF in an NDIS driver

July 20, 2014, 6:25 am

≪ Previous: The NDIS API naming convention

Can, Should, and How?

WDF is a framework that makes it easier to write Windows drivers. NDIS is a framework for writing low-level Windows network drivers. The purposes of these frameworks overlap a bit, and some people (okay, probably many people) are confused about the relationship between NDIS and WDF. Today we’ll set down a few guidelines. But first – let’s dispel one tenacious myth.

Myth: Some people think that NDIS drivers cannot use WDF.

In reality, you can use WDF in your NDIS driver. I know this works rather well, because I have personally written several WDF-based NDIS drivers.

So where do people get the idea that WDF is incompatible with NDIS? There are a few sources of this idea:

When writing an NDIS miniport driver, certain parts of WDF are not compatible with NDIS. You must put WDF into a mode sometimes referred to as “miniport mode”. Not all WDF APIs are available in miniport mode. See the step-by-step checklist here. Note that this restriction only applies to NDIS miniport (and IM) drivers; protocols and LWFs can use the full breadth of WDF functionality.
Miniport drivers must also put NDIS into a special mode, called NDIS-WDM mode. This is a poor name, because it seems to indicate that you must use WDM. The reality is that NDIS-WDM mode just means your driver can use any non-NDIS framework. (At the time that NDIS-WDM mode was invented, there were no other frameworks besides WDM, so the name didn’t seem to be too constraining. If it helps, you can think of it as NDIS-WD* mode.)
Most of the NDIS drivers that are included with Windows (like TCPIP) don’t use WDF. But this isn’t because Windows developers are avoiding WDF; it’s because most inbox drivers simply predate WDF. If we were writing the network stack from scratch, we’d use more WDF. New drivers like MSLLDP, an NDIS protocol driver included with Windows 8, are indeed based on WDF.

Now that we know you can combine WDF with NDIS, let’s talk about whether you should combine WDF with NDIS. In nearly all cases, an NDIS driver will work with or without WDF. So you rarely have the decision forced upon you by the technology. Ultimately, it will come down to what you decide, based (hopefully) on a good engineering judgment call. Let’s collect some evidence to help you make that decision.

Reasons you should use WDF in your NDIS driver

Your engineering team is already familiar with WDF.
You will be developing several drivers, including non-networking drivers. (Might as well learn WDF now, and maybe you can share some library code between your drivers.)
Your driver already uses WDF.
You are writing an NDIS miniport that uses IRPs on its lower edge (USB, SDIO, etc.)
You are writing a protocol or LWF that interacts with non-NDIS parts of the OS (usermode IOCTLs, WSK requests, etc.)
Your code would benefit from WDF’s clever object management system to avoid memory leaks.
You are new to Windows driver development, and have no idea where to start 😰

Generally speaking, it’s a good idea to consider WDF. But there are a few reasons why WDF might not be very useful to your NDIS driver:

Reasons that WDF won’t help in your NDIS driver

Your engineering team is already very familiar with NDIS, but has no experience with WDF.
You are maintaining a mature driver that does not use WDF.
You are writing a simple NDIS miniport on a directly-connected bus (like PCI).
You are writing a protocol or LWF that has minimal interaction with the rest of the OS. This driver mostly only calls NDIS APIs.
Your codebase must be compatible with platforms where WDF is not available (like Windows CE).

Mind you, it’s still quite possible to link against WDF in these situations. But you’ll probably find that there aren’t a lot of opportunities to actually use WDF APIs. Integrating with WDF doesn’t give a lot of value if you don’t call its APIs. In those cases, the pragmatic engineering decision may be to just not use WDF.

Okay, so let’s suppose you’ve decided to give WDF a spin. You’ll eventually notice that WDF overlaps somewhat with NDIS. For example, both frameworks have APIs for workitems (NdisQueueIoWorkItem versus WdfWorkItemEnqueue). Which API should you use? Again, in many cases, either framework’s APIs will work. Again, it’s an engineering decision that ought to consider several factors, including maintaining consistency with your other code, etc. But if you are new to NDIS and WDF, you can use this quick-reference table as a starting place for your decision-making process.

API family	Use NDIS APIs?	Use WDF APIs?	Use WDM APIs?
Work items	Avoid	Preferred	Do not use
Timers	Avoid	Preferred	Do not use
Memory allocation	Avoid	Preferred	Okay
Locks & interlocks	Avoid (but RW locks are okay)	Preferred	Preferred
Events	Avoid	Preferred	Preferred
String handling	Avoid	Preferred	Preferred
DMA	Preferred	Preferred	Avoid
Interrupts	Preferred	Not permitted	Not permitted
DPCs (for miniports)	Preferred for interrupts	Okay for non-interrupts	Avoid
DPCs (for non-miniports)	Avoid	Preferred	Avoid
Processor information	Avoid (except RSS APIs)	(no equivalent)	Preferred
IRPs and IOCTLs (for miniports)	Required	Not permitted	Not permitted
IRPs and IOCTLs (for non-miniports)	Avoid	Preferred	Avoid
Direct bus/port access	Okay	Preferred	Preferred
Reading configuration	Preferred for standard keywords	Preferred for other registry values	Okay for other registry values
File I/O	Avoid	(no equivalent)	Preferred

Remember, the above table only contains guidelines. It is still acceptable to ship a driver that uses an API marked “Avoid”. You should use the table to help nudge your decision-making when you have no other compelling reasons to use a particular API family.

↧

Using C++ in an NDIS driver

July 27, 2014, 7:19 am

≫ Next: Mapping from NDIS OIDs to WMI classes

≪ Previous: Using WDF in an NDIS driver

Are NDIS drivers allowed to use C++?

The first question is easy: can NDIS drivers be written in C++? The answer: yes. In this case, NDIS doesn’t have any official stance on C++, so we just fall back on the WDK’s general rules. As of Windows Driver Kit 8, Microsoft officially supports using a subset of the C++ language in drivers. (“Subset? What subset?” There’s more precise information here.)

The inevitable follow-up question is more nuanced: should NDIS drivers be written in C++? The answer is: it depends. Here are some facts that will help you derive a more specific answer:

The NDIS API is a C API. There is no NDIS API that magically gets better or worse when you’re coming from C++ versus C.
The NDIS team has no future plans to make a feature that requires C++. We are well-aware that many of our developers are dedicated fans of C, and have strong opinions on C++. Don’t worry — C isn’t going anywhere.
The NDIS team may, in the future, add minor conveniences that only light up in C++. For example, the WDK macro ARRAYSIZE is defined differently for a C++ driver, which gives it better abilities to detect misuse with pointers. NDIS.H may start adding macros that offer minor improvements for C++ code, just like WDM.H already has today.
Several major IHVs build their production NDIS miniport drivers using C++.
Several major IHVs build their production NDIS miniport drivers using C.
Microsoft builds some drivers in C and some drivers in C++.
Our NDIS sample drivers are all in C. (This is largely for historical reasons, as these drivers were created before C++ was officially supported. If we were creating a new sample today, we’d consider writing it in C++.)

In summary, then, either language works fine, and it all comes down to a matter of your preference.

↧

Mapping from NDIS OIDs to WMI classes

March 20, 2015, 5:55 pm

≫ Next: Eliminating empty handlers

≪ Previous: Using C++ in an NDIS driver

In which we write a PowerShell script, install the WDK, attach a kernel debugger, reverse-engineer the OS, and prove Goldbach’s conjecture

We’ve previously talked about how to rummage through all the NDIS WMI classes, but there’s one topic we haven’t fully covered. Suppose you’re looking for the WMI class that maps to a specific OID — how do you find the right class?

There are a few ways you can do this. The first is just to take a guess based on the name. Suppose we want to find the WMI class that corresponds to OID_GEN_VLAN_ID. Let’s search for any WMI class that has “VLAN” in the name:

Get-WmiObject -Namespace root\wmi -List | Where-Object {$_.name -Match "VLAN" }

My machine has only one matching class, MSNdis_VlanIdentifier, and indeed that’s the right one.

But this technique relies somewhat on luck. What if you don’t find any matches — should you keep searching, or does that mean there really is no WMI class for that OID? So that takes us to the more methodical approach.

If you install the Windows Driver Kit (WDK), then it will give you several rather helpful files:

wmicore.mof	Defines each of the built-in NDIS WMI classes
ndisguids.h	Defines the names of the GUIDs that underlie built-in NDIS WMI classes
ndis.h and ntddndis.h	Defines various OIDs, structures, and flags that will be useful in decoding some WMI classes

Let’s start in wmicore.mof. Open it in a text editor to find a comment indicating which WMI class implements OID_GEN_VLAN_ID. Again, we find MSNdis_VlanIdentifier.

///     OID_GEN_VLAN_ID:
[WMI, Dynamic, Provider("WMIProv"), guid("{765dc702-c5e8-4b67-843b-3f5a4ff2648b}"),
 Description("NDIS VLAN Identifier") : amended]
class  MSNdis_VlanIdentifier : MSNdis
{
    [ read, write, Description("The IEEE 802.1Q VLAN ID assigned to this NIC.") : amended,
        WmiDataId(1)]    uint32    NdisVlanId;
};

By searching through this file, you can find all the built-in WMI classes that NDIS provides.

Not so fast, you say. You want to be methodical, and relying on code comments is not exactly bulletproof. What about MSNdis_VendorID, which is in wmicore.mof, but is missing a comment mentioning which OID it is tied to?

Here’s where ndisguids.h comes in handy. Note that each WMI class in wmicore.mof has a GUID. For example, MSNdis_VendorID has GUID {5ec1035e-a61a-11d0-8dd4-00c04fc3358c}. You can find that same GUID in ndisguids.h (although the numbers are presented a little differently):

DEFINE_GUID(GUID_NDIS_GEN_VENDOR_ID,
    0x5ec1035e, 0xa61a, 0x11d0, 0x8d, 0xd4, 0x00, 0xc0, 0x4f, 0xc3, 0x35, 0x8c);

Unlike the WMI class names, the GUID names in ndisguid.h have the same naming scheme as OID names. So GUID_NDIS_GEN_VENDOR_ID corresponds to OID_GEN_VENDOR_ID. You can do a similar transformation for each GUID that is related to an OID.

Let’s summarize what we have gotten so far. NDIS provides WMI classes on every miniport. NDIS translates an OID into a GUID, and WMI translates that GUID into a WMI class. You can download the WDK to see OIDs in ntddndis.h, the GUIDs in ndisguid.h, and the WMI classes themselves in wmicore.mof.

Is that all? Well… not quite. These are valid techniques to explore the NDIS-provided WMI classes… but what about miniport- or LWF-provided classes? Some miniport drivers implement their own private WMI classes. Is there a way to peek at those?

Yup. But to do this, we’ll need to trot out a kernel debugger. Run !ndiskd.miniport -wmi <miniporthandle> to see all the WMI classes the miniport provides.

0: kd> !ndiskd.miniport ffffe000be3761a0 -wmi

WMI

    f4a8027a-23b7-11d1-9ed9-00a0c9010057   OID 0xffa0c90a
                       OID, ARRAY, CUSTOM
    GUID_NDIS_ENUMERATE_ADAPTER            [N/A]
                       READ, NOT_SETTABLE, NDIS_ONLY
    GUID_NDIS_NOTIFY_ADAPTER_REMOVAL       [N/A]
                       STATUS, EVENT_ENABLED, NOT_SETTABLE, NDIS_ONLY
    GUID_NDIS_GEN_LINK_SPEED               OID_GEN_LINK_SPEED
                       OID, READ, NOT_SETTABLE
    GUID_NDIS_GEN_VENDOR_ID                OID_GEN_VENDOR_ID
                       OID, READ, NOT_SETTABLE

In practice, a miniport will have hundreds of GUIDs; the excerpt above highlights just a few of the types of WMI classes you might find. A class marked with the OID flag is (unsurprisingly) translated to an OID. In the excerpt above, you can see that my miniport supports OID_GEN_VENDOR_ID, as well as a vendor-custom OID 0xffa0c80a. (The miniport also supports the WMI event GUID_NDIS_NOTIFY_ADAPTER_REMOVAL.)

Let’s suppose we want to find the corresponding WMI class for that vendor-private GUID/OID. It’s little surprise that PowerShell can do it in a hurry. Just drop in the GUID that you got from !ndiskd. (If !ndiskd hid the GUID behind its friendly name, as it did for GUID_NDIS_GEN_VENDOR_ID, you can unmask it by running !ndiskd.help GUID_NDIS_GEN_VENDOR_ID.)

Get-WmiObject -Namespace root\wmi -List |
Where-Object {$_.Qualifiers[‘guid’].Value -eq ‘{f4a8027a-23b7-11d1-9ed9-00a0c9010057}’ }

If you happen to have that vendor’s NIC driver installed, you’ll see their WMI class pop up. If not, well, try again with one of the system-provided GUIDs.

Now we’ve seen three ways to map WMI classes to OIDs:

Guess a likely class name and search for it with PowerShell
Search for the OID in ndisguid.h, then find the matching GUID in wmicore.mof
Use !ndiskd.miniport -wmi to find all the GUIDs that are available on a particular miniport

I’m out of space for today, so I’ll have to save my proof of Goldbach’s conjecture for next week.

↧

Eliminating empty handlers

February 12, 2017, 4:01 pm

≫ Next: A new video on NDIS debugging

≪ Previous: Mapping from NDIS OIDs to WMI classes

Don’t come back empty-handlered

NDIS drivers have several opportunities to supply advanced functionality through optional handlers. But if you don’t want the advanced functionality, you don’t need to bother implementing an empty handler.

Why does it matter to you? It matters because it makes your code (slightly) more difficult to maintain. “Dummy” code is more lines of worthless boilerplate code that distracts you from your “real” code. There’s also a tiny performance cost, as NDIS has to set up the function call and jump into your code.

Let’s look at an example. Filter drivers are a good model, because the majority of their handlers are optional. From MSDN, we see that the FilterSendNetBufferLists/ FilterSendNetBufferListsComplete handlers are optional.

Suppose your filter looks like this:

NTSTATUS
DriverEntry(
    IN PDRIVER_OBJECT DriverObject,
    IN PUNICODE_STRING RegistryPath
    )
{
    NDIS_FILTER_DRIVER_CHARACTERISTICS FChars;
    // ...
    FChars.SendNetBufferListsHandler = FilterSendNetBufferLists;
    FChars.SendNetBufferListsCompleteHandler = FilterSendNetBufferListsComplete;
    // ...

    Status = NdisFRegisterFilterDriver(
        DriverObject,
        (NDIS_HANDLE)FilterDriverObject,
        &FChars,
        &FilterDriverHandle);
    // ...
}

VOID
FilterSendNetBufferLists(
    IN NDIS_HANDLE FilterModuleContext,
    IN PNET_BUFFER_LIST NetBufferLists,
    IN NDIS_PORT_NUMBER PortNumber,
    IN ULONG SendFlags
)
{
    // Dummy function; just pass the NBLs through
    NdisFSendNetBufferLists(
        pFilter->FilterHandle,
        NetBufferLists,
        PortNumber,
        SendFlags);
}

VOID
FilterSendNetBufferListsComplete(
    IN NDIS_HANDLE FilterModuleContext,
    IN PNET_BUFFER_LIST NetBufferLists,
    IN ULONG SendCompleteFlags
)
{
    // Dummy function; just pass the NBLs through
    NdisFSendNetBufferListsComplete(
        pFilter->FilterHandle,
        PrevNbl,
        SendCompleteFlags);
}

As you can see, the FilterSendNetBufferLists and FilterSendNetBufferListsComplete functions don’t do anything useful. How could we improve the code? Simple: just delete the dummy functions and register NULL for the function handlers.

NTSTATUS
DriverEntry(
    IN PDRIVER_OBJECT DriverObject,
    IN PUNICODE_STRING RegistryPath
)
{
    NDIS_FILTER_DRIVER_CHARACTERISTICS FChars;
    // ...
    FChars.SendNetBufferListsHandler = NULL;
    FChars.SendNetBufferListsCompleteHandler = NULL;
    // ...

    Status = NdisFRegisterFilterDriver(
        DriverObject,
        (NDIS_HANDLE)FilterDriverObject,
        &FChars,
        &FilterDriverHandle);
    // ...
}

Much better!

↧

A new video on NDIS debugging

March 21, 2017, 8:28 am

≫ Next: It’s perfcounter week on the NDIS blog!

≪ Previous: Eliminating empty handlers

The Defrag Tools show was kind enough to host me for a quick chat about debugging NDIS drivers. Check it out!

↧

It’s perfcounter week on the NDIS blog!

June 5, 2017, 11:04 am

≫ Next: My favorite perfcounters

≪ Previous: A new video on NDIS debugging

Actually every week is perfcounter week.

Performance counters are an essential tool for devs, ops, … and marketing. Yet they’re often not well understood. Fortunately, under the hood, performance counters are very simple: a perfcounter is just a number that counts things.

Before we get too far along, let’s agree on a bit of terminology:

A counter is a single measurement, like the number of interrupts per second.
A counter set is a group of related counters, like a group of 9 counters relating to TCP/IP. The CLR calls this a category name, and some Win32 APIs call this a performance object. They all refer to the same thing. I’ll stick with the counter set term on this blog.
A counter instance is a specific thing that can be measured, like a network adapter.

Conventionally, you can specify a counter using the counter path notation. The formal grammar is documented here, but most commonly in Networking, you’ll use the subset that looks like this:

Counter Set(Counter Instance)\Counter

So for example, Network Interface(BitBlaster 2000)\Bytes Sent/sec belongs to the Network Interface counter set, targets the BitBlaster 2000 instance, and measures the Bytes Sent/sec counter.

Counters in a GUI

Enough terminology; let’s do things. If you’ve used perfcounters before, you’ve probably come across Performance Monitor, aka perfmon. You can use this built-in tool to rummage around and visualize the various perfcounters. It’s great if you don’t quite know which counter you’re looking for, or if you need a quick overview of what some counter is doing. Here’s how I use it:

Launch perfmon.exe.
In the tree on the left, select Performance Monitor to get to the graph.
In the graph’s toolbar, click the Delete button (it looks like a red X) to remove any pre-added counters. We’ll add our own.
Then click the Add button (it looks like a green +) to add more interesting counters.
In the dialog that appears, the upper-left quadrant shows all counter sets. If you expand one of the dropdowns inside a counter set, you can see an individual counter. Once you select an individual counter, you can see all applicable counter instances in the lower-left quadrant.
Select a counter instance, then click the Add button to add the counter to the list of counters to monitor.
Once you’ve added all the counters you like, click OK.

That will look like this:

To make the graph look right, you may have to fiddle with the scale a bit. Right-click on the counter at the bottom of the main window, select Properties, and change the Scale dropdown on its properties page.

Counters in PowerShell

Perfmon is great for high-level explorations, but you’ll eventually run up against its limitations. For example, it won’t send you an email if a counter goes out of tolerance. For the most flexibility, we need to go beyond a GUI.

Performance counters have been around for a while, so there are plenty of APIs, libraries, and tools you can choose from. I will use PowerShell for illustration, but you can use whatever API you prefer. Generally the concepts exposed in PowerShell will map onto any API.

Let’s poll a counter from PowerShell:

PS C:\> Get-Counter ‘\Processor(0)\% DPC Time’
Timestamp CounterSamples
——— ————–
6/4/2017 10:14:10 \\jtippet-d\processor(0)\% dpc time :
1.56111154738975

The counter values come back as real numbers, so you can perform all the usual arithmetic on them. For example, you can check if the counter value exceeds 10%:

Get-Counter ‘\Processor(0)\% DPC Time’ -Continuous | % {
$percentDpcTime = $_.CounterSamples[0].CookedValue
if ($percentDpcTime -gt 10) {
Send-MailMessage -To ‘me@example.com’ -Subject ‘Alert’ -BodyAsHtml “% time at DPC: $percentDpctime”
} }

(Don’t actually build a production monitoring system like this – there’s off-the-shelf software that has way more features.)

Now that we have some of the basics down, next time we’ll take a look at the goodies you can use to measure the heartbeat of the Windows network stack.

↧

My favorite perfcounters

June 6, 2017, 11:27 am

≫ Next: Goodbye

≪ Previous: It’s perfcounter week on the NDIS blog!

… are the NDIS counters. Like you even had to ask.

We love performance counters on the Windows Networking team. We routinely use performance counters to monitor & diagnose issues. Nearly every networking feature has its own counter set, and Windows ships with far too many counter sets to document here. Fortunately, I don’t have to document them all – this is the NDIS blog, so I’ll lean on my NDIS bias.

These are the Windows Networking counter sets of particular interest to NDIS developers:

Network Adapter and Network Interface
Per Processor Network Interface Card Activity and Per Processor Network Activity Cycles
Physical Network Interface Card Activity
RDMA Activity
Processor Information

Let’s chat a bit about each of these.

The Network Adapter and Network Interface counter sets.

These are very similar. In fact, the only difference is that Network Adapter shows more types of NICs than Network Interface does. Since Network Adapter has more stuff, I generally prefer it over Network Interface. The only reason to use Network Interface is if you need to write code that works on very old OS versions – Network Adapter was “only” added back in 2006.

The data in these counter sets is pulled together from a few sources:

Many counters are populated from GetIfTable2, which itself ultimately boils down to querying OID_GEN_STATISTICS from the NIC driver.
RSC counters: These are calculated internally by TCPIP.
Current Bandwidth: This is the arithmetic mean of the NIC driver’s NDIS_LINK_STATE XmitLinkSpeed and RcvLinkSpeed.
Output Queue Length: This is not currently implemented, and should be ignored.

The Per Processor Network Interface Card Activity and Per Processor Network Activity Cycles counter sets.

What a mouthful. These two counter sets are more advanced than Network Adapter, and you can use these to dig deeper into a performance issue. Your frontline troops should be your Network Adapter counters. But if that’s not enough, it’s time to call in the Per Processor NDIS counters.

As their names suggest, these counters track data per processor. That’s super helpful when diagnosing RSS or interrupt delivery problems. The Activity counter tracks how many times a particular event occurs, like how many packets are received. The Cycles counter tracks how many CPU cycles are spent on a particular task, like how many CPU cycles are spent in a call to the NIC driver’s interrupt handler. These two counter sets work great in tandem – you can typically divide the Cycles data by the Activity data to get cycles-per-operation.

One caveat about interrupts and DPCs: Not all hardware has interrupts (in the traditional sense, at least). Typically only a PCI bus based NIC will use interrupts or DPCs. For all other types of NICs, these counters will report zero. Furthermore, NDIS drivers can choose to either use NDIS APIs (like NdisMRegisterInterruptEx) or to use WDM APIs to handle interrupts and DPCs. If the driver doesn’t use NDIS APIs, then NDIS won’t be able to track the interrupts and DPCs, and these counters will be zero. So these counters may not report values for some vendors’ drivers.

One caveat about the cycle counters: The cycle counters are measured in terms of the nominal clock tick of your processor. On x86 and x64, this is the units of the rdtsc instruction. On arm32, this is the units of the pmccntr register. Cycle counters are not directly comparable between machines with different processors. If your processor throttles itself down to a lower effective clock in order to save power (e.g, Intel’s P-states), then the cycle counter might not be meaningful. You should only make comparisons between cycle counters under carefully-controlled lab conditions.

One final caveat: These counters were added in Windows 7, but their implementation was changed a bit in Windows 8. Although you’ll find counters of the same name on Windows 7, the definitions below apply only to Windows 8 or Windows Server 2012 and later. I do not suggest you use these counters on Windows 7, unless you’ve first carefully verified that the counter seems to give reliable results.

Because these counters are so low-level, it’s worth spending a bit of time explaining what each of them means. Here’s the rundown.

Interrupts/sec: The number of times the MiniportMessageInterrupt or MiniportInterrupt handler was invoked and returned TRUE to indicate that the interrupt was recognized.

Interrupt Cycles/sec: The number of CPU cycles spent in MiniportMessageInterrupt or MiniportInterrupt.

Interrupt DPC Latency Cycles/sec: The number of CPU cycles between the end of an ISR and the start of a DPC. If this value is unexpectedly high, it may indicate that other DPCs or ISRs are starving the NIC’s DPC. Use WPA to see a timeline of all ISRs and DPCs on the CPU.

DPCs Queued/sec: The number of times that a DPC was executed. (The name is a tiny bit misleading; it’s not the number of times a DPC was queued.)

DPCs Queued on Other CPUs/sec: The number of processors that are targeted by NdisMQueueDpcEx, excluding the current processor. For example, suppose a NIC driver is running on processor 5 and calls NdisMQueueDpcEx with a mask that targets processors 4, 5, and 7. Then this counter will increment by 2, for processors 4 and 7. This will often be zero. If it is a very high number, it may indicate that RSS is running in a less-than-optimal mode. Ideally, a NIC that supports RSS will interrupt the correct processor directly, without needing to schedule a cross-processor DPC.

DPCs Deferred/sec: The number of times that NDIS decided to execute a DPC at passive level, due to Receive Side Throttling (RST). Generally you should expect this to be at zero for Server-class machines.

Interrupt DPC Cycles/sec: The number of CPU cycles spent in MiniportMessageInterruptDpc or MiniportInterruptDpc. Typically this includes both send-complete and receive-indicate activity.

Received Packets/sec: Watch out: this one is subtle! It’s the total number of packets that the NDIS indicated to each protocol driver. For example, if the NIC indicates up an IPv4 frame, but there is no IPv4 protocol driver bound to the NIC, then that frame is not counted; the counter will be zero. More interestingly, if there are two protocols bound to the NIC that both want IPv4 frames (e.g., the TCPIP driver and Wireshark’s NPF driver), then each packet gets counted twice. The gotcha is that if you launch Wireshark, this counter will double. Not because you’re receiving more traffic at the physical layer, but because NDIS is indicating the same traffic to twice as many protocol drivers.

Receive Indications/sec: The total number of times that NDIS indicated packets to a protocol driver. You can divide Received Packets/sec by Receive Indications/sec to get the average batch size. Low-performance client NICs might not batch packets at all, in which case the average batch size will be 1.0. Increasing the Interrupt Moderation setting should increase the batch size.

Low Resource Received Packets/sec: This is the total number of packets indicated to a protocol driver while the NDIS_RECEIVE_FLAGS_RESOURCES flag is set. Some NIC drivers never use this flag, and a few NIC drivers will use this flag always. In some cases, NDIS always sets the flag internally. Despite the name, the flag doesn’t necessarily mean that the NIC is low on resources, although some drivers use it to signal that. In general, you’ll need to know exactly how the NIC driver and NDIS use the flag to interpret this counter. Note that the OS may spend more cycles processing a received packet if the NDIS_RECEIVE_FLAG_RESOURCES flag is set, so you should generally prefer that this counter is at or near zero. If your servers typically have this flag at zero, but suddenly it spikes to a high value, that is a red flag that some network driver is spending too long processing received packets, or has leaked received packets.

Low Resource Receive Indications/sec: The number of times that NDIS indicates packets to a protocol driver with the NDIS_RECEIVE_FLAGS_RESOURCES flag.

Stack Receive Indication Cycles/sec: The number of CPU cycles spent in any ProtocolIndicateReceiveNetBufferLists handler.

NDIS Receive Indication Cycles/sec: The number of CPU cycles spent in NdisMIndicateReceiveNetBufferLists. Typically this would include each filter driver, as well as any protocol drivers. However, if a filter driver defers a receive indication, the time spent in in protocol drivers will not be included.

Returned Packets/sec: The number of receive packets that were processed by the OS, and returned to the NIC driver stack. Specifically, the number of packets that were passed to a call to NdisReturnNetBufferLists. This should almost exactly correlate with Received Packets/sec minus Low Resource Received Packets/sec.

Return Packet Calls/sec: The number of times a protocol driver called NdisReturnNetBufferLists.

NDIS Return Packet Cycles/sec: The number of CPU cycles spent in NdisReturnNetBufferLists. Typically this includes all time spent in filter drivers, the NIC driver, and NDIS itself. However, if a filter driver defers the return operation, or the network adapter is in a low power state due to NDIS Selective Suspend, then this counter will not include all CPU cycles.

Miniport Return Packet Cycles/sec: Despite the name, this counter is implemented to be largely similar to NDIS Return Packet Cycles/sec. This counter includes filters drivers, the NIC driver, and NDIS itself; and the same caveats apply.

Sent Packets/sec: The number of packets that were sent via NdisSendNetBufferLists. Note that, in rare cases, an NDIS lightweight filter driver may itself drop or originate new send packets. Those will not be accounted for in this counter; this counter measures the top of the filter stack.

Send Request Calls/sec: The total number of calls to NdisSendNetBufferLists.

Miniport Send Cycles/sec: The total number of CPU cycles spent in MiniportSendNetBufferLists. This includes much of the time sending packets, but will not account for CPU cycles spent in any workitem, timer, interrupt, or DMA callback. Therefore, (like any cycle counter) it is not comparable across different NIC drivers.

NDIS Send Cycles/sec: The total number of CPU cycles spent in NdisSendNetBufferLists. Typically, this includes all time spent in filters, the miniport driver, and NDIS itself. However, if any filter defers a packet and reinjects it later, then those CPU cycles won’t be included here. For example, the built-in QoS Pacer filter defers some NBLs and reinjects them from a timer.

Sent Complete Packets/sec: The total number of packets that were returned to a protocol driver, via ProtocolSendNetBufferListsComplete.

Send Complete Calls/sec: The total number of calls to any ProtocolSendNetBufferListsComplete handler.

Stack Send Complete Cycles/sec: The total number of CPU cycles spent in ProtocolSendNetBufferListsComplete handlers.

NDIS Send Complete Cycles/sec: The total number of CPU cycles spent in NdisMSendNetBufferListsComplete. This may include time spent in filter drivers and protocols, if the filter drivers handle send completion synchronously. In that case, this counter’s value will be slightly larger than the Stack Send Complete Cycles/sec counter’s value, and their difference tells you how many cycles were spent in the filter stack and NDIS itself. However, if a filter driver defers send completion, this counter will not include time spent in protocols.

Build Scatter Gather List Calls/sec: The number of times the NIC driver calls NdisMAllocateNetBufferSGList. Typically this would be comparable to the number of sent packets per second.

Build Scatter Gather Cycles/sec: The number of cycles spent inside NdisMAllocateNetBufferSGList. Note that this may also include the time spent in MiniportProcessSGList, if the map registers are immediately available and the MiniportProcessSGList handler is called inline. However, if MiniportProcessSGList is deferred and called later, this counter does not then include cycles spent waiting or the cycles spent in MiniportProcessSGList. This counter is difficult to interpret, and I generally do not suggest you use it.

RSS Indirection Table Change Calls/sec: The number of times that a protocol driver updates the RSS indirection table on a NIC. Specifically, the number of times that OID_GEN_RECEIVE_SCALE_PARAMETERS is sent and the NDIS_RSS_PARAM_FLAG_ITABLE_UNCHANGED flag is clear. This should typically be a low number – RSS table updates are pure overhead, and ideally they don’t happen often. If you have many short-lived TCP connections and high CPU usage, the OS may need to update the table more often.

Miniport RSS Indirection Table Change Cycles: The total number of cycles spent in a MiniportOidRequest handler, for an OID_GEN_RECIEVE_SCALE_PARAMETERS request that updates the RSS indirection table. Note that this counter can be unreliable if the CPU cycle counter is not synchronized across all CPUs on the system, since indirection table changes are run at PASSIVE_LEVEL and are not affinitized to a particular processor.

Packets Coalesced/sec: The result of OID_PACKET_COALESCING_FILTER_MATCH_COUNT.

Phew! We made it… now on to the next counter set!

The Physical Network Interface Card Activity counter set.

At present, this counter set is mostly useful for NIC drivers that implement NDIS Selective Suspend. In the future, we may add other counters to it, but right now it’s all power management.

Low Power Transitions (Lifetime): The number of times that a network adapter has entered a low power state (D2 or D3). This can be as result of NDIS Selective Suspend, connected standby, system S3 or S4, or D3-on-disconnect. This counter is reset each time the device is halted (i.e., receives IRP_MN_REMOVE_DEVICE or the system powers off).

% Time Suspended (Lifetime): The percent of time since the NIC was started that the NIC is in any low power state. For devices that implement NDIS Selective Suspend, this ought to be a highish percentage when the device isn’t under heavy use.

% Time Suspended (Instantaneous): Similar to the above, except the time interval is equal to your sample interval. E.g., if you use the -SampleInterval parameter to the Get-Counter PowerShell cmdlet to set the sample interval to 5 seconds, then this counter returns what percent of the last 5 seconds that the device was in low power.

Device Power State: Returns the current D-state of the device. If the device is in D0, this counter returns 0. If the device is in D2, this counter returns 2. If the device is in D3, this counter returns… I bet you can guess. This one’s useful for keeping an eye on NDIS Selective Suspend. Toss it into a perfmon graph, and you can see exactly what the D-state is at each moment.

The RDMA Activity counter set.

The individual counters in this one all come from OID_NDK_STATISTICS, which is supplied by the NIC driver.

The Processor Information counter set.

This is not a networking-specific counter set, but it’s invaluable when diagnosing network performance problems. In particular, the % Interrupt Time, % DPC Time, and Interrupts/sec counters are quite useful. Note that Interrupts/sec is a system-wide counter that includes IPIs, so it can be difficult to infer what exactly causes an unexpectedly-large number of interrupts. Fortunately WPA lets you drill into the finest minutiae of any interrupt storm.

Honorable mentions

We’ve covered the counters that are tied to NDIS, but that’s hardly the end of the story. The counter sets listed below are likely of interest to anyone who’s looking at the Windows network stack. Don’t be afraid to rummage through all the counters that come with the OS – there’s some good stuff in there! For example, if you’re investigating a problem with TCP checksum offload, you’ll want to keep an eye on the TCP checksum errors counter.

IPv4 and IPv6
TCPv4 and TCPv6
UDPv4 and UDPv6
WFPv4 and WFPv6
TCPIP Performance Diagnostics and TCPIP Performance Diagnostics (Per-CPU)
Hyper-V Virtual Network Adapter
Hyper-V Virtual Network Adapter Drop Reasons
Hyper-V Virtual Network Adapter VRSS
Hyper-V Virtual Switch
Hyper-V Virtual Switch Port
Hyper-V Virtual Switch Processor
Network Virtualization

Enough! No more lists of counters! Next time we’ll have fewer words and more pictures as we visualize the flow of statistics and counters in Windows.

↧

Goodbye

January 7, 2019, 11:59 am

≪ Previous: My favorite perfcounters

Microsoft is retiring this blogging platform, so the NDIS blog will be removed soon. If you have any articles that you need to reference, please save them locally.

↧