Quantcast
Channel: The NDIS blog
Viewing all 48 articles
Browse latest View live

!ndiskd.pendingnbls

$
0
0

I’ve got your NBLs right here

The most common issue we see in NDIS drivers is a “lost packet”.  You have lost a packet when NDIS gives your driver a NET_BUFFER_LIST (NBL) and your driver never returns the packet back to NDIS.  A lost packet will often show up as a hang during Pause or a 0x9F bugcheck.

These issues are also very difficult to debug.  NDIS says “hey, I sent you 1,044,949,195,033 NBLs, but you only returned 1,044,949,195,032.”  Now what?

Starting with Windows 7 SP1, NDIS can track every packet that goes through each NDIS driver.  The packet tracking works like this: just before NDIS gives an NBL to a driver, the NBL is stamped with the driver’s handle.  This means that if you can just search through all the NBLs on the system, you can identify the lost NBL by finding the one NBL that is still stamped with your driver’s handle.

Fortunately, you don’t have to do this manually.  The !ndiskd.pendingnbls debugger extension is clever enough to do the search for you.  !ndiskd.pendingnbls will identify every NBL that is not “at home”, i.e., is not currently held by the same driver that allocated the NBL.

Let’s look at a short example:

kd> !ndiskd.pendingnbls
PHASE 1/3: Found 19 NBL pool(s).
PHASE 2/3: Found 0 freed NBL(s).

    Pending Nbl        Currently held by                                       
    ffffcf800287cd20   ffffcf8002750c70 - NDIS Sample LightWeight Filter-0000  [Filter]                   

PHASE 3/3: Found 1 pending NBL(s) of 1885 total NBL(s).                     
Search complete.

What is this showing?  The debugger extension counted 1,885 total NBLs on the system.  Of those, most are currently held by whichever driver allocated them, so they're not considered “pending”.  There’s only one NBL, 0xffffcf800287cd20, that is still missing.  NDIS last gave that NBL to a Filter driver named “NDIS Sample LightWeight Filter”.  That filter driver rises to the top of the list of suspects.

Not all pending NBLs are bad.  Every time a packet is sent or received, an NBL goes pending.  If you want to see a pending NBL in action, just set a breakpoint on your datapath handler and run !ndiskd.pendingnbls — you should see the NBL that was just passed to your driver.

NBLs that are pending for “a long time” are bad — they’re likely leaks, and can cause bugchecks or app hangs.  If you’re debugging a 0x9F bugcheck or hang during Pause, the datapath has been stopped for some time, so any NBLs that are still pending are likely leaks.

One last note.  There’s a small (<1% path length) cost to NBL tracking, so NDIS does not enable it by default on Windows Server.  If you are doing NDIS development, you should enable NBL tracking.  There are two ways to enable NBL tracking on Windows Server:

    1. Windows Server 2012 R2 and later:  Just enable Driver Verifier on NDIS.SYS.  This is already a best practice for NDIS developers, so you should already be doing this anyway.
    2. Windows Server 2008 R2 SP1 and later: Set the below registry key to 1:

HKLM\SYSTEM\CurrentControlSet\Services\NDIS\Parameters ! TrackNblOwner [REG_DWORD]

Next time we’ll talk about another nifty way to keep an eye on your NBLs.


!ndiskd.nbl -log

$
0
0

All your NBL are belong to !ndiskd

Last time we talked about !ndiskd.pendingnbls.  This command shows you which component currently holds an NBL.  But what if you want to see how the NBL got there?  That sounds like a job for !ndiskd.nbl-log!

Starting with Windows 8 and Windows Server 2012, NDIS can be configured to record a log of all NBL activity.  NDIS holds a large circular buffer, and writes an event to the buffer each time an NBL changes hands or is cloned.  You can use !ndiskd.nbl -log to search through that giant ringbuffer and collect all the events mentioning this NBL.

Here’s some example output:

kd> !ndiskd.nbl @rdx -log
Allocated
ProtocolSent       ffffcf80012b2c70 - QoS Packet Scheduler-0000
FilterSent         ffffcf800121cc70 - NDIS Sample LightWeight Filter-0000
FilterSent         ffffcf800125ec70 - WFP Native MAC Layer LightWeight Filter-0000
FilterSent         ffffe000012261a0 - Microsoft Hyper-V Network Adapter
SentToMiniport     ffffe000012261a0 - Microsoft Hyper-V Network Adapter

What does this show?  Read the log from top to bottom, where each line is one event in the history of the NBL.  On the first line, we see the NBL was Allocated from NDIS.  The next line shows that a protocol (not named here) sent the NBL to QoS Packet Scheduler.  Next a filter (presumably QoS) sent the NBL to another filter, the NDIS Sample LightWeight Filter.  Skipping down to the last line, we see the last event was the NBL getting sent to a miniport.  So this NBL is a pretty classic case of an NBL being allocated by a protocol, then sent down to the NIC, where it’s currently pending transmission.

That was an easy one.  Let’s take a look at a more complex example:

kd> !ndiskd.nbl @rdx -log
Allocated
    Child-1: Cloned    Parent: fffff8021269ae00, new child: fffffa800ae31a80
    Child-1-1: Cloned  Parent: fffffa800ae31a80, new child: fffffa8007c625c0
    Child-1-2: Cloned  Parent: fffffa800ae31a80, new child: fffffa8008ff1600
    Child-1-2: FreedClone
    Child-1-1: FreedClone
    Child-1: FreedClone
    Child-2: Cloned    Parent: fffff8021269ae00, new child: fffffa800ae31a80
ProtocolSent       fffffa800a0cbc80 - QoS Packet Scheduler-0000
FilterSent         fffffa800af711a0 - Microsoft Hyper-V Network Adapter
SentToMiniport     fffffa800af711a0 - Microsoft Hyper-V Network Adapter
    Child-2-1: Cloned  Parent: fffffa800ae31a80, new child: fffffa8007c625c0
    Child-2-1: FreedClone
    Child-2: FreedClone
MiniportSendCompleted fffffa800a0cbc80 - QoS Packet Scheduler-0000
FilterSendCompleted send complete in NDIS, sorting to Opens
SendCompleted      fffffa8008260010 - TCPIP
Freed

What is going on here?  The first thing you notice is the “Child” NBLs.  The log output will show the activity of the NBL you specify, and also any derived NBLs.  So the output shows that a driver allocated a Clone NBL, which !ndiskd names “Child-1”.  (There’s no significance to the name, other than to make it unique in the !ndiskd output.)  Then the driver cloned the clone, so we have a grandchild NBL named “Child-1-1”.  (The naming scheme is to just append a unique number to the parent’s name, so the third grandchild of the second child would be named “Child-2-3”.)  Another grandchild named “Child-1-2” is allocated.  Then all three clones are freed (presumably the driver used them outside of NDIS, perhaps in WFP, so NDIS didn’t log any activity on these NBLs).

Next we see something interesting.  Another clone is created, “Child-2”.  Then the original NBL is sent down the stack.  This is illegal!  NDIS requires that you hold onto the parent NBL while there are outstanding child NBLs.  But we can see that the parent NBL was sent down the stack before the “Child-2” NBL was freed.

Now let’s take a look at one final example:

kd> !ndiskd.nbl @rdx -log
AllocatedNb
ProtocolSent       fffffa800e3518a0 - QoS Packet Scheduler-0000
FilterSent         fffffa800d23a1a0 - Microsoft Hyper-V Network Adapter
SentToMiniport     fffffa800d23a1a0 - Microsoft Hyper-V Network Adapter
Smuggled to        fffffa800d385010 - BUGGY_PROTOCOL
ProtocolSent       fffffa800d2481a0 - Sample Network Adapter
SentToMiniport     fffffa800d2481a0 - Sample Network Adapter
MiniportSendCompleted NDIS
SendCompleted      fffffa800d385010 - BUGGY_PROTOCOL
Freed

This log was recorded just before a bugcheck.  The interesting line is “Smuggled to … BUGGY_PROTOCOL”.  What is “smuggling”?  !ndiskd defines smuggling as any time that an NBL seems to magically hop from one NDIS driver to another, without going through NDIS.  Sometimes, this is perfectly legal.  For example, the miniport and protocol edges of an IM driver exchange packets between them using internal bindings, without going through NDIS.  So IM drivers always smuggle packets between their miniport and protocol edges; this is not a bug.  Likewise, the Microsoft Virtual WiFi Miniport Adapter” is in cahoots with the “Microsoft Virtual WiFi Filter Driver”; they smuggle packets around by-design.

But in a case like this, you see a packet magically hopping from the Hyper-V network adapter to an unrelated protocol driver, the aptly-named BUGGY_PROTOCOL.  That is suspicious, since these two drivers are not in cahoots, and should have no secret backchannel through which to smuggle packets.  The reality is that BUGGY_PROTOCOL has a use-after-free bug, so that protocol tried to send the packet to another NIC while the packet was still in the network adapter.

Before we go, you need to know a few important details.  This NBL logging uses a global ringbuffer, which means that the history of an NBL might be incomplete if the ringbuffer has wrapped around.  The ringbuffer is sized based on how much physical RAM you have, but even on a system with terrabytes of RAM, the ringbuffer can only hold a few milliseconds of events for full 10Gbps traffic.

NBL logging is disabled by default.  You can enable it by setting this registry value to 4 and rebooting the computer:

HKLM\SYSTEM\CurrentControlSet\Services\NDIS\Parameters ! TrackNblOwner [REG_DWORD]

Some people ask what are the performance implications of NBL history logging.  This depends highly on the number of CPUs you have, the amount of traffic you’re pushing, the number of NDIS drivers that are installed, etc.  As a rule-of-thumb, I measured the CPU path length increase to be 3x the default configuration — large enough that you wouldn’t want to enable this during perf testing, but small enough that you probably won’t notice it if you enable it during functional testing.

Summary of packet-tracking techniques

$
0
0

Tracking the packet tracking

We just covered a couple ways to track packets in the kernel debugger.  Here’s a quick reference table to help you understand how these techniques fit into your toolbelt.

 !ndiskd.pendingnbls!ndiskd.nbl -log
DocumentationHereHere
Finds “lost packets”YesNo
Finds “smuggled packets”NoYes
Finds use-after-freeNoYes
Loses data if ringbuffer wraps aroundNoYes
Number of historical events recorded1Many (depends on size of ringbuffer)
Records NBL ownership YesYes
Records NBL allocation/freeNoYes
Records NBL clone/fragmentNoYes
CPU performance impactNegligibleApprox 3x CPU usage
Memory footprint impactNone32kb – 32mb, depending on RAM size
Enabled by default on client SKUYesNo
Enabled by default on server SKUNoNo
Enabled when TrackNblOwner is at least...13
Minimum operating system versionWindows 7 SP1 or
Windows Server 2008 R2 SP1
Windows 8 or
Windows Server 2012

Making minidumps more useful

$
0
0

Miniport: meet minidump

Minidumps are a small (~100kb) record of a crash.  As their name suggests, they’re optimized for small size… at the expense of usefulness.  Minidumps include just enough information to see the stack of the faulting thread, but they don’t generally have other threads or most of kernel pool.  If someone brings me with a minidump, the first thing I ask is “um, do you have anything better?”.

But that doesn’t mean that minidumps are completely useless.  And with a little care, minidumps with your NDIS miniport can be just a little more useful.  Here’s the trick.

When the system bugchecks, NDIS attempts to detect whether the bugcheck was caused by the network stack.  If so, NDIS tries to determine which network driver is at fault.  If NDIS determines that your miniport is at fault, NDIS will add some extra information to the minidump: enough data for !ndiskd.miniport to (mostly) work, and also a small chunk of your MiniportAdapterContext.  In Windows 7 through Windows 8.1, NDIS will save the first 1024 * sizeof(void*) bytes of your MiniportAdapterContext.  In other words,

Architecture Context bytes saved
x648096
x864096
arm4096

This is useful to know, because it helps you lay out your context block.  You’ll find that, if you put your most important state into the first few kilobytes of your context block, then you’ll have an easier time debugging minidumps.

Happy debugging!

Thanks for your help making Windows great!

$
0
0

Really, it’s all about self-interest

Remember way back when you first set up your new computer?  Windows probably prompted you to “join the Customer Experience Improvement Program”.  For those of you who elected to join this program: thanks!  Without any extra effort on your part, you are helping us design a better product.

But with an unwieldy name like “Customer Experience Improvement Program (CEIP)”, it’s probably hard to imagine what, exactly, goes on behind-the-scenes.  Let’s take a look at how your votes help shape a core Windows component like NDIS.

NDIS is the system component that manages your network hardware and low-level network drivers.  NDIS contains a number of APIs that network drivers can use to interact with the operating system.  But not all APIs are created equal: some APIs were added years ago, and don’t really make much sense today.  Yet these obsolete APIs still periodic require work from the NDIS team to maintain them.  Is all that work worth it?

Now we know.  Included with the CEIP in Windows 8.1, NDIS tracks usage of several of our most impressively-obsolete APIs.  And the poll numbers are in: there are several obsolete APIs that are not used by a single computer in the world.  For example, NdisMSetMiniportSecondary is no longer used by anybody.  You can be sure that we’ll remove these unused APIs, so that Windows is smaller and more efficient.  So we win (less maintenance work) and you win (smaller, leaner operating system).  Although joining the CEIP helps Microsoft, it also helps you, because your feedback ensures you’ll get better products in the future.  Really, it’s all about self-interest.  😀

When we talk about a topic like this, it’s always important to ask the question: if Windows sends information back to Microsoft, what about my privacy?  Here’s the official answer.  My unofficial summary: we won’t spam or call you — in fact, we can’t, since we proactively work to avoid accidentally collecting your email or phone number.  Every single type of data that is collected is reviewed by a team with privacy experts, to make sure we only collect boring engineering stuff (like whether drivers use obsolete NDIS APIs).  CEIP is about your computer, not you.

In closing, here’s one more thought to mull over.  Network adapters can calculate IPv4 checksums in hardware, which saves CPU when processing IPv4 packets.  This is a common feature: about 86% of Windows 8.1 users have Ethernet NICs that support TCPv4 checksum offload.  But IPv6 has been growing in popularity, and it has nearly closed the hardware gap: we’re up to 78% of customers who can calculate TCPv6 offloads in hardware.  Maybe 2014 will finally be the year of widespread IPv6 rollout.  😉

Why is there a redundant Restart-NetAdapter cmdlet?

$
0
0

Sometimes you can’t just Enable your way out of a Disable mess

Windows 8 and Windows Server 2012 include a whole set of new PowerShell cmdlets to manage the network stack.  These cmdlets include Enable-NetAdapter and Disable-NetAdapter.  Believe it or not, those two cmdlets let you enable and disable your network adapters, respectively.  Want to shut off networking in a hurry?

PS C:\> Disable-NetAdapter *

If that were all you needed to know, these cmdlets would be so obvious, they wouldn’t even be worth writing about.  As you might have guessed from the length of this page, there are actually a few surprises lurking in the void between Enable and Disable.

Surprise 1: You can’t call Enable immediately after Disable

Suppose you want to set a few advanced properties on the NIC named “Ethernet”.  You might write this script:

Set-NetAdapterAdvancedProperty Ethernet -DisplayName 'Flow Control' -DisplayValue Disabled
Set-NetAdapterAdvancedProperty Ethernet –DisplayName 'Jumbo Packet' -DisplayValue Disabled

That’ll work just fine, but it’s a little klunky.  By default, the Set-NetAdapterAdvancedProperty cmdlet restarts the NIC, so the new value takes effect immediately.  But when you set 2 or more properties in a row, the NIC goes through unnecessary restarts.  Recognizing this inefficiency, you might try a new version of the script that coalesces all the restarts to the end:

Set-NetAdapterAdvancedProperty Ethernet -NoRestart -DisplayName 'Flow Control' -DisplayValue Disabled
Set-NetAdapterAdvancedProperty Ethernet -NoRestart–DisplayName 'Jumbo Packet' -DisplayValue Disabled
Disable-NetAdapter Ethernet
Enable-NetAdapter Ethernet

But now you find that that script fails while trying to re-enable the adapter.  What gives?

It turns out that the Disable-NetAdapter cmdlet is asynchronous.  It initiates the disable operation, then returns immediately back to the script, before the adapter is fully disabled.  When the script then tries to call Enable-NetAdapter on the NIC, the Enable cmdlet fails, because the adapter isn’t fully disabled yet.

While you could work around the race by inserting Start-Sleep, there’s a better way: Restart-NetAdapter.  The Restart-NetAdapter cmdlet combines a Disable and an Enable into a single operation.  Restart-NetAdapter ensures that the Enable operation happens as soon as possible, but no sooner.

So Restart-NetAdapter is better than just a script that calls Disable-NetAdapter + Enable-NetAdapter.  Really, the whole is greater than the sum of its parts.

Surprise 2: Wildcard matching doesn’t always work

We saw earlier how a grumpy administrator might try to disable all the NICs on the system with Disable-NetAdapter *.  It seems logical that the proper way to re-enable your NICs is to run Enable-NetAdapter *.  But there’s a subtlety here: the * wildcard doesn't always match the same set of NICs in both commands.  Let’s see what happens if you use NIC Teaming to create a team, then try to disable and re-enable all the NICs.

Initially, both the physical NIC and the Team Interface are enabled:

Team InterfacePhysical NICTeam

Then we run Disable-NetAdapter *, and Windows evaluates the wildcard to both adapters.  Both get disabled:

Team InterfacePhysical NICTeam

But when Microsoft NIC Teaming detects that all member NICs have been disabled, NIC Teaming will remove the entire team:

(Gone)Physical NICTeam

Now when you run Enable-NetAdapter *, the wildcard matches all adapters: but only one adapter exists!  The cmdlet only enables the physical NIC:

(Gone)Physical NICTeam

Finally, NIC Teaming notices that one of its member NICs has returned, so NIC Teaming restores the Team Interface(s).  But remember, the last thing we did to the team interface was disable it, so the team interface comes up in a disabled state:

Team InterfacePhysical NICTeam

So as you can see, Enable-NetAdapter * does not completely undo the effects of Disable-NetAdapter *.  What, then, is a good way to do this?  Restart-NetAdapter to the rescue, again.  When you run Restart-NetAdapter *, the wildcard is only evaluated once, so it includes the Team Interface, before the Team Interface is removed.

Bonus surprise: Remote desktop facepalm

Disable-NetAdapter is dangerous when you’re logged in remotely, because if you disable the NIC you were using to connect, you won’t be able to log back in again to re-enable the NIC.  You know that, I know that, everybody knows that.  But we still all make this mistake sooner or later.  (My favorite variant of this story involves a remote kernel debugger and wayward flow control PAUSE frames taking the local network offline, killing the connection to the kernel debugger….)

Anyway, if you have to bounce the NICs, the safest solution is once again to use Restart-NetAdapter.  That’ll still kill your remote connection, but hopefully the NIC will come back up and the connection will be restored automatically.

Kernel debugging over the network

$
0
0

What just happened to my NIC?!

We’ve previously published some tips on how to use the debugger to fix your NDIS miniport driver.  But today we’re going to turn the tables and talk about how the debugger uses NDIS to break your miniport driver.

You can debug the Windows kernel through several transports.  One of the fastest transports is the NET transport, which works over common network cards connected by an ordinary Ethernet cable.  Most developers are happy to run through the setup checklist and just begin debugging.  But for those of you who work on network drivers, you might notice a few strange things happening to the network stack when NET debugging is enabled.  What’s going on?

First, let’s take a look at how the debugger works when it’s not using networking.  This diagram shows the network stack behaving normally.

NIC hardwareNIC hardwareMiniport driverNDISThe NT kernelDebug transport1394 hardware

The miniport driver talks to its hardware, and the debug transport is preoccupied with some other piece of hardware.  Everything is normal.  But now when we enable kernel debugging over the NIC, let’s see how the diagram changes.

NIC hardwareNIC hardwareDebug transportMiniport driverNDISThe NT kernel?

In the second diagram, the debug transport has taken over the NIC hardware completely!  The miniport driver, NDIS, and even the kernel itself are excluded from talking to the NIC hardware.  That’s why, when kernel debugging is enabled, you see the NIC device getting “banged out” in Device Manager.  At boot, NDIS tried to initialize the miniport, but NDIS saw that the kernel debugger had exclusive access to the hardware, so NDIS failed the miniport’s AddDevice.  In other words, the yellow exclamation mark is by design: the vendor NIC “failed” to load because the kernel debugging is enabled.

If that were all to the story, then we’d have working kernel debugging, but completely broken networking.  Fortunately, there’s more.

When the debugger team at Microsoft planned this feature, they realized that network debugging wouldn’t be very useful if it broke your networking.  So they also planned a second piece to the architecture.  The debug transport exposes a virtual network adapter, based off of the underlying physical adapter.  This virtual adapter has a regular NDIS driver, so the rest of the OS can talk on the network.

NIC hardwareNIC hardwareDebug transportMiniport driverNDISThe NT kernel?Miniport driverNDISThe NT kernelNIC hardwareVirtual NIC

This virtual NIC is named the “Microsoft Kernel Debug Network Adapter”.  It is pre-installed on all Windows 8 machines, but it is only enabled once kernel debugging over networking is enabled.  When it’s enabled, it carries your normal traffic (TCPIP, etc.) out to the physical NIC that is now being used for kernel debugging.

So now you can explain why there’s that additional network device hidden in Device Manager: it’s waiting for the moment that you enable kernel debugging, so it can be your main network adapter.  You can also explain why the Network Connections folder shows a different adapter when kernel debugging is enabled: TCPIP really is bound to a different adapter (as far as the OS knows).

NdisFRegisterFilterDriver fails… now what?

$
0
0

Decoding the error codes

“I compiled my NDIS filter driver, but NdisFRegisterFilterDriver fails in my DriverEntry function.  Now what?”

Here’s a table listing common problems and fixes.  Rows are grouped by symptom.

ProblemResolution
NDIS_STATUS_BAD_CHARACTERISTICS (0xc0010005)
The Characteristics block has the wrong Header for the NDIS driver version.If you are writing an NDIS 6.0 filter driver, set NDIS_FILTER_DRIVER_CHARACTERISTICS::Header::Revision to NDIS_FILTER_CHARACTERISTICS_REVISION_1.  Otherwise, for any version NDIS 6.1 or later, use NDIS_FILTER_CHARACTERISTICS_REVISION_2.
The Characteristics block has an incorrect Size or Type.

Make sure that NDIS_FILTER_DRIVER_CHARACTERISTICS::Header.Size is set to sizeof(NDIS_FILTER_DRIVER_CHARACTERISTICS).

Make sure that NDIS_FILTER_DRIVER_CHARACTERISTICS::Header.Type is set to NDIS_OBJECT_TYPE_FILTER_DRIVER_CHARACTERISTICS.

Your filter is missing the 4 mandatory handlers.Make sure you provide FilterPause, FilterRestart, FilterAttach, and FilterDetach handlers in your characteristics block.
Your filter has inconsistent OID handlers.

If you provide FilterOidRequest, you must provide FilterOidRequestComplete.  Likewise, if you provide FilterOidRequestComplete, you must provide FilterOidRequest.

Likewise, do not provide a FilterCancelOidRequest handler if you do not have a FilterOidRequest handler.  However, the inverse is not true: if you provide FilterOidRequest, you may optionally provide FilterCancelOidRequest.  I only recommend you provide it if you queue OIDs for some nontrivial amount of time.

Both rules above also apply to Direct OID handlers.

Your filter has inconsistent NBL handlers.If you do not provide a FilterSendNetBufferLists handler, then you must not provide a FilterCanceSendNetBufferLists handler either.
NDIS_STATUS_FAILURE (0xc0000001)
The filter doesn’t match the INF.Check that NDIS_FILTER_DRIVER_CHARACTERISTICS::UniqueName:
  1. is a GUID,
  2. has curly braces, and
  3. precisely matches what’s in your INF’s NetcfgInstanceId directive.
The filter was not correctly installed.Verify that this key exists:

HKLM\SYSTEM\CurrentControlSet\Control\Network\{4d36e974-e325-11ce-bfc1-08002be10318}\{your filter’s GUID}\Ndi

If the key doesn’t exist, check whether your installer actually ran, and whether its call to INetCfgClassSetup::Install succeeded.

NDIS_STATUS_BAD_VERSION (0xc0010004)
The current implementation of NDIS doesn’t support the NDIS contract version that your filter driver is asking for.

Check the list of NDIS versions by OS version against the NDIS_FILTER_DRIVER_CHARACTERISTICS::MajorNdisVersion and ::MinorNdisVersion that your driver is requesting.  Make sure that you’re getting the right version.

In particular, note that there is no such thing as “NDIS 6.2”.  If you’re setting MajorNdisVersion=6 and MinorNdisVersion=2, you’ll get this error code.  Use MinorNdisVersion=20 instead.

Some other error code
Your filter driver’s FilterSetOptions handler was called and returned a failure code.

NDIS calls into your FilterSetOptions handler within the context of your filter’s call to NdisFRegisterFilterDriver.  If your FilterSetOptions handler returns an error code, then NdisFRegisterFilterDriver will fail with the same error code.

Set a breakpoint on your filter’s FilterSetOptions handler and verify that it succeeds.

A hang, crash, or other strange behavior
Your filter driver is using a duplicate UniqueName.Make sure that no other filter uses the same UniqueName as your filter.  In particular, ensure that you are not using the same UniqueName as the WDK sample filter: {5cbf81bd-5055-47cd-9055-a76b2b4e3697}
Another network driver hangs when trying to attach your filter.

Use a kernel debugger to run:

!stacks 2 ndis!

Check whether there are any tall callstacks in NDIS that appear to be pausing or unbinding any drivers.  It’s likely that some driver has hung.  This hang may not be caused directly by your filter driver; it may only be exposed while NDIS tried to attach your filter driver.

When all else fails, you can often get more detail by looking at NDIS trace messages.  If you spend a lot of time working on NDIS drivers, I encourage you to get comfortable with reading these traces, since that ability will save you time in the long run.  In this case, NdisFRegisterFilterDriver will often trace a message saying exactly what has gone wrong.


Using the checked version of NDIS.SYS

$
0
0

I assert that this is a good way to find bugs

Installing the checked version of the operating system is an effective technique to quickly find bugs in your network driver.  If you’re not familiar with checked builds (and even if you are), you should read the excellent documentation here.  Seriously, read it; I won’t repeat it here.

What do you get with the checked build of NDIS?

The main difference is that NDIS’s implementation has (as of Windows 8.1) approximately 2200 extra asserts.  While some of these asserts verify NDIS’s internal bookkeeping is consistent, many of them verify that your driver uses NDIS’s APIs correctly.  For example, NDIS asserts the current IRQL is correct when each MiniportXxx callback returns, to help catch the class of bug where your miniport driver leaks an IRQLs or spinlock.

Prior to Windows 7, using the checked build of NDIS is also the only way to see NDIS’s debug traces.  But as of Windows 7, these traces are now available from WPP, so there’s no longer a need to use the checked build solely for tracing.

What’s the downside of using a checked build?

There are two downsides to using checked builds: performance and false-positives.

Checked builds are noticeably slower.  But they aren’t as bad as you might think.  We still compile checked builds with most compiler optimizations enabled, so the only slowdowns are a few extra verifications here and there.  Still, the operating system has zillions of assertions, so those do add up.  You definitely don’t want to use a checked build for any performance-related work.  But they’re just fine for initial work on a new feature or functional testing.

False positives are also a problem.  Sometimes you’ll see assertions that fail for reasons that don’t seem to be related to your driver.  When you see an unfamiliar assertion, you’ll first want to spent a moment to convince yourself that the assertion failure is really caused by your driver.  For example, if there’s an assertion in win32k.sys about an invalid HRGN, that’s probably not caused by any network driver.  Prior to Windows 8, the operating system was kind of “noisy”; a nontrivial percentage of its assertions would fire for benign reasons.  We worked hard to clean that up in Windows 8, so the asserts have a better signal-to-noise ratio.  (Like many Windows engineers, I used a checked build of the OS as my primary workstation for some time during Windows 8 development.  That was fun.)

If you discover an assertion in NDIS.SYS that you believe is a false positive, please let me know here and I’ll try to clean that up.  (Unfortunately I’m not knowledgeable about non-networking drivers, so I can’t promise I can help you with any random assertion that you come across.)

How can you get the checked build of NDIS?

MSDN has the story on how to download a copy of the checked build.  From there, you have two options:

Since MSDN already explains the first option, I won’t repeat those instructions here.  Let’s talk about the second option: how to selectively replace a few drivers.

First, identify the drivers that you want to replace.  Here’s a table of drivers you can consider replacing:

DriverWhen to replace
The kernel & HALAlways
NDIS.SYSAlways
TCPIP.SYSMiniport, LWF, and WFP callout drivers
NETIO.SYS (Windows Vista and later)Whenever TCPIP.SYS is replaced
FWPKCLNT.SYSWFP callout drivers
(your bus driver, e.g., PCI.SYS)Miniport drivers
NWIFI.SYSNative 802.11 drivers
VWIFIBUS.SYS
VWIFIFLT.SYS
VWIFIMP.SYS
Native 802.11 drivers that implement MAC virtualization (WFD or SoftAP)
NDISUIO.SYSWWAN drivers
WMBCLASS.SYSWWAN drivers that implement the class driver model
VSWITCH.SYSHyper-V extensible switch extension

Keep in mind — these are just guidelines.  You are not required to test with any particular set of drivers, and you might want to fine-tune the list depending on what subsystem you’re targeting.  If you are unsure about which binaries to replace, remember you can always just install the entire checked OS, which gives you the maximum checked build coverage.

Now that you know which drivers to replace, you can extract them from the checked build media.  If you obtained installable media, you can mount the included INSTALL.WIM with DISM.EXE to get at the individual drivers, or you can just install the OS into a throw-away VM to get convenient access to its drivers.

Finally, you'll need to actually replace these drivers on your target OS.  Don’t do this on a production OS machine; we can’t officially support this.  The easiest way to replace binaries is to hook up a kernel debugger and use the .kdfiles feature.  For example, here’s the mapfile that I use to replace NDIS.SYS on a test machine:

map
\Windows\system32\DRIVERS\NDIS.SYS
c:\path\to\ndis.sys

Note that the name of the driver will depend on how the driver is loaded.  Use CTRL+D or CTRL+ALT+D in the debugger and reboot the target machine to see the official name of each driver.

Note that the process for replacing the kernel & HAL is special.

Oh, and sorry for the awful pun in the subtitle.

The NDIS API naming convention

$
0
0

NdisFWhat?  Your secret decoder ring to NDIS functions

The first time you come across NDIS, you might find yourself lost in the enormous number of NDIS APIs, OIDs, status codes, and data structures.  What’s the difference between NdisMIndicateStatus and NdisFIndicateStatus?  Fortunately, NDIS has naming conventions that make it a little easier to organize the APIs.  Let’s take a look.

NDIS.SYS exports approximately 500 APIs on Windows 8.1.  (Yes, five hundred!)  Of those, all but three start with the name “Ndis”.  So that’s our first convention: NDIS APIs are almost always named NdisSomething.  That might seem obvious, but there’s a very subtle piece of information encoded in that name: NDIS’s internal routines are prefixed with “ndis”.  See the difference?  Upper-case N means it’s an exported API, and lower-case n means it’s an internal API.

Why should you care about NDIS’s internal routines?  Well most of the time, you don’t.  But when you start debugging your driver, you’ll inevitably be looking at some callstacks with NDIS routines (both exported and internal).  If you don’t know what the callstack is doing, you might get some clues by looking at related MSDN documentation.  Try looking at this callstack and see if you can guess which API is most likely to be documented on MSDN:

00 tcpip!FlReceiveNetBufferListChain
01 ndis!ndisMIndicateNetBufferListsToOpen
02 ndis!ndisIndicateSortedNetBufferLists
03 ndis!ndisMDispatchReceiveNetBufferListsInternal
04 ndis!ndisMTopReceiveNetBufferLists
05 ndis!ndisCallReceiveHandler
06 ndis!ndisIterativeDPInvokeHandlerOnTracker
07 ndis!ndisInvokeNextReceiveHandler
08 ndis!ndisMIndicateReceiveNetBufferListsInternal
09 ndis!NdisMIndicateReceiveNetBufferLists
0a netvsc!ReceivePacketMessage

That’s not all; it takes more than an N to spell Naming convention.  Following the “Ndis” prefix there is often a code indicating which type of driver is associated with the API.  Here’s a reference table of the codes:

CodePurpose
MMiniport drivers
FFilter drivers
IMIntermediate drivers
IfNetwork interface providers
CoCoNDIS drivers
ClCoNDIS call clients
CmCoNDIS call managers
MCoCoNDIS miniport drivers
MCmCoNDIS integrated call managers

Functions that don’t have a code are meant to be called from multiple types of drivers, or from protocol drivers.  (For some reason, protocol drivers never earned their own code.  I’m not sure why, but who am I to argue with two decades of tradition?)

So now without even consulting the documentation, you can guess that NdisMSetTimer is for miniport drivers, while NdisIMNotifyPnPEvent is for intermediate drivers.  And I bet you can answer the question at the start of this article: what is the difference between NdisMIndicateStatus and NdisFIndicateStatus?

Enough trivia games; let’s return to practical applications of this knowledge.  Suppose you are debugging a callstack that looks like this

00 DriverY+0x22af
01 ndis!ndisMSendNBLToMiniport+0xb1
02 ndis!NdisFSendNetBufferLists+0x64
03 DriverX+0x1b6d
04 ndis!NdisFSendNetBufferLists+0x64
05 pacer!PcFilterSendNetBufferLists+0x9d
06 ndis!ndisSendNBLToFilter+0x69
07 ndis!NdisSendNetBufferLists+0x85
08 tcpip!FlpSendPacketsHelper+0x675

Although you might not be familiar with DriverX or DriverY, you can use NDIS naming conventions to guess their roles.  At frame 03, DriverX calls an NdisFXxx function in frame 02.  Therefore, DriverX is a filter driver.  You can also see an ndisMXxx internal NDIS function at frame 01, so you know that NDIS has finished calling all the filters, and has moved on to the miniport before calling DriverY in frame 00.  So it’s reasonable to conclude that DriverY is a miniport driver.

Another very practical application: knowing the codes will help you narrow down the APIs that you need to worry about when you’re writing code.  If you’re writing a lightweight filter driver, you can safely ignore the approximately 130 APIs that are exclusively for miniports (NdisMSomething).  That makes it a bit easier to find the API you’re looking for.

Now you know the ABC’s of NDIS!

Using WDF in an NDIS driver

$
0
0

Can, Should, and How?

WDF is a framework that makes it easier to write Windows drivers.  NDIS is a framework for writing low-level Windows network drivers.  The purposes of these frameworks overlap a bit, and some people (okay, probably many people) are confused about the relationship between NDIS and WDF.  Today we’ll set down a few guidelines.  But first – let’s dispel one tenacious myth.

Myth: Some people think that NDIS drivers cannot use WDF.

In reality, you can use WDF in your NDIS driver.  I know this works rather well, because I have personally written several WDF-based NDIS drivers.

So where do people get the idea that WDF is incompatible with NDIS?  There are a few sources of this idea:

  • When writing an NDIS miniport driver, certain parts of WDF are not compatible with NDIS.  You must put WDF into a mode sometimes referred to as “miniport mode”.  Not all WDF APIs are available in miniport mode.  See the step-by-step checklist here.  Note that this restriction only applies to NDIS miniport (and IM) drivers; protocols and LWFs can use the full breadth of WDF functionality.
  • Miniport drivers must also put NDIS into a special mode, called NDIS-WDM mode.  This is a poor name, because it seems to indicate that you must use WDM.  The reality is that NDIS-WDM mode just means your driver can use any non-NDIS framework.  (At the time that NDIS-WDM mode was invented, there were no other frameworks besides WDM, so the name didn’t seem to be too constraining.  If it helps, you can think of it as NDIS-WD* mode.)
  • Most of the NDIS drivers that are included with Windows (like TCPIP) don’t use WDF.  But this isn’t because Windows developers are avoiding WDF; it’s because most inbox drivers simply predate WDF.  If we were writing the network stack from scratch, we’d use more WDF.  New drivers like MSLLDP, an NDIS protocol driver included with Windows 8, are indeed based on WDF.

Now that we know you can combine WDF with NDIS, let’s talk about whether you should combine WDF with NDIS.  In nearly all cases, an NDIS driver will work with or without WDF.  So you rarely have the decision forced upon you by the technology.  Ultimately, it will come down to what you decide, based (hopefully) on a good engineering judgment call.  Let’s collect some evidence to help you make that decision.

Reasons you should use WDF in your NDIS driver

  • Your engineering team is already familiar with WDF.
  • You will be developing several drivers, including non-networking drivers.  (Might as well learn WDF now, and maybe you can share some library code between your drivers.)
  • Your driver already uses WDF.
  • You are writing an NDIS miniport that uses IRPs on its lower edge (USB, SDIO, etc.)
  • You are writing a protocol or LWF that interacts with non-NDIS parts of the OS (usermode IOCTLs, WSK requests, etc.)
  • Your code would benefit from WDF’s clever object management system to avoid memory leaks.
  • You are new to Windows driver development, and have no idea where to start 😰

Generally speaking, it’s a good idea to consider WDF.  But there are a few reasons why WDF might not be very useful to your NDIS driver:

Reasons that WDF won’t help in your NDIS driver

  • Your engineering team is already very familiar with NDIS, but has no experience with WDF.
  • You are maintaining a mature driver that does not use WDF.
  • You are writing a simple NDIS miniport on a directly-connected bus (like PCI).
  • You are writing a protocol or LWF that has minimal interaction with the rest of the OS.  This driver mostly only calls NDIS APIs.
  • Your codebase must be compatible with platforms where WDF is not available (like Windows CE).

Mind you, it’s still quite possible to link against WDF in these situations.  But you’ll probably find that there aren’t a lot of opportunities to actually use WDF APIs.  Integrating with WDF doesn’t give a lot of value if you don’t call its APIs.  In those cases, the pragmatic engineering decision may be to just not use WDF.

Okay, so let’s suppose you’ve decided to give WDF a spin.  You’ll eventually notice that WDF overlaps somewhat with NDIS.  For example, both frameworks have APIs for workitems (NdisQueueIoWorkItem versus WdfWorkItemEnqueue).  Which API should you use?  Again, in many cases, either framework’s APIs will work.  Again, it’s an engineering decision that ought to consider several factors, including maintaining consistency with your other code, etc.  But if you are new to NDIS and WDF, you can use this quick-reference table as a starting place for your decision-making process.

API familyUse NDIS APIs?Use WDF APIs?Use WDM APIs?
Work itemsAvoidPreferredDo not use
TimersAvoidPreferredDo not use
Memory allocationAvoidPreferredOkay
Locks & interlocksAvoid (but RW locks are okay)PreferredPreferred
EventsAvoidPreferredPreferred
String handlingAvoidPreferredPreferred
DMAPreferredPreferredAvoid
InterruptsPreferredNot permittedNot permitted
DPCs (for miniports)Preferred for interruptsOkay for non-interruptsAvoid
DPCs (for non-miniports)AvoidPreferredAvoid
Processor informationAvoid (except RSS APIs)(no equivalent)Preferred
IRPs and IOCTLs (for miniports)RequiredNot permittedNot permitted
IRPs and IOCTLs (for non-miniports)AvoidPreferredAvoid
Direct bus/port accessOkayPreferredPreferred
Reading configurationPreferred for standard keywordsPreferred for other registry valuesOkay for other registry values
File I/OAvoid(no equivalent)Preferred

Remember, the above table only contains guidelines.  It is still acceptable to ship a driver that uses an API marked "Avoid".  You should use the table to help nudge your decision-making when you have no other compelling reasons to use a particular API family.

Using C++ in an NDIS driver

$
0
0

Are NDIS drivers allowed to use C++?

The first question is easy: can NDIS drivers be written in C++?  The answer: yes.  In this case, NDIS doesn’t have any official stance on C++, so we just fall back on the WDK’s general rules.  As of Windows Driver Kit 8, Microsoft officially supports using a subset of the C++ language in drivers.  (“Subset?  What subset?”  There’s more precise information here.)

The inevitable follow-up question is more nuanced: should NDIS drivers be written in C++?  The answer is: it depends.  Here are some facts that will help you derive a more specific answer:

  • The NDIS API is a C API.  There is no NDIS API that magically gets better or worse when you’re coming from C++ versus C.
  • The NDIS team has no future plans to make a feature that requires C++.  We are well-aware that many of our developers are dedicated fans of C, and have strong opinions on C++.  Don't worry — C isn’t going anywhere.
  • The NDIS team may, in the future, add minor conveniences that only light up in C++.  For example, the WDK macro ARRAYSIZE is defined differently for a C++ driver, which gives it better abilities to detect misuse with pointers.  NDIS.H may start adding macros that offer minor improvements for C++ code, just like WDM.H already has today.
  • Several major IHVs build their production NDIS miniport drivers using C++.
  • Several major IHVs build their production NDIS miniport drivers using C.
  • Microsoft builds some drivers in C and some drivers in C++.
  • Our NDIS sample drivers are all in C.  (This is largely for historical reasons, as these drivers were created before C++ was officially supported.  If we were creating a new sample today, we’d consider writing it in C++.)

In summary, then, either language works fine, and it all comes down to a matter of your preference.

Mapping from NDIS OIDs to WMI classes

$
0
0

In which we write a PowerShell script, install the WDK, attach a kernel debugger, reverse-engineer the OS, and prove Goldbach’s conjecture

We’ve previously talked about how to rummage through all the NDIS WMI classes, but there’s one topic we haven’t fully covered.  Suppose you’re looking for the WMI class that maps to a specific OID — how do you find the right class?

There are a few ways you can do this.  The first is just to take a guess based on the name.  Suppose we want to find the WMI class that corresponds to OID_GEN_VLAN_ID.  Let’s search for any WMI class that has “VLAN” in the name:

Get-WmiObject -Namespace root\wmi -List  | Where-Object {$_.name -Match "VLAN" }

My machine has only one matching class, MSNdis_VlanIdentifier, and indeed that’s the right one.

But this technique relies somewhat on luck.  What if you don’t find any matches — should you keep searching, or does that mean there really is no WMI class for that OID?  So that takes us to the more methodical approach.

If you install the Windows Driver Kit (WDK), then it will give you several rather helpful files:

wmicore.mofDefines each of the built-in NDIS WMI classes
ndisguids.hDefines the names of the GUIDs that underlie built-in NDIS WMI classes
ndis.h and ntddndis.hDefines various OIDs, structures, and flags that will be useful in decoding some WMI classes

Let’s start in wmicore.mof.  Open it in a text editor to find a comment indicating which WMI class implements OID_GEN_VLAN_ID.  Again, we find MSNdis_VlanIdentifier.

///     OID_GEN_VLAN_ID:
[WMI, Dynamic, Provider("WMIProv"), guid("{765dc702-c5e8-4b67-843b-3f5a4ff2648b}"),
 Description("NDIS VLAN Identifier") : amended]
class  MSNdis_VlanIdentifier : MSNdis
{
    [ read, write, Description("The IEEE 802.1Q VLAN ID assigned to this NIC.") : amended,
        WmiDataId(1)]    uint32    NdisVlanId;
};

By searching through this file, you can find all the built-in WMI classes that NDIS provides.

Not so fast, you say.  You want to be methodical, and relying on code comments is not exactly bulletproof.  What about MSNdis_VendorID, which is in wmicore.mof, but is missing a comment mentioning which OID it is tied to?

Here’s where ndisguids.h comes in handy.  Note that each WMI class in wmicore.mof has a GUID.  For example, MSNdis_VendorID has GUID {5ec1035e-a61a-11d0-8dd4-00c04fc3358c}.  You can find that same GUID in ndisguids.h (although the numbers are presented a little differently):

DEFINE_GUID(GUID_NDIS_GEN_VENDOR_ID,
0x5ec1035e, 0xa61a, 0x11d0, 0x8d, 0xd4, 0x00, 0xc0, 0x4f, 0xc3, 0x35, 0x8c);

Unlike the WMI class names, the GUID names in ndisguid.h have the same naming scheme as OID names.  So GUID_NDIS_GEN_VENDOR_ID corresponds to OID_GEN_VENDOR_ID.  You can do a similar transformation for each GUID that is related to an OID.

Let’s summarize what we have gotten so far.  NDIS provides WMI classes on every miniport.  NDIS translates an OID into a GUID, and WMI translates that GUID into a WMI class.  You can download the WDK to see OIDs in ntddndis.h, the GUIDs in ndisguid.h, and the WMI classes themselves in wmicore.mof.

Is that all?  Well… not quite.  These are valid techniques to explore the NDIS-provided WMI classes… but what about miniport- or LWF-provided classes?  Some miniport drivers implement their own private WMI classes.  Is there a way to peek at those?

Yup.  But to do this, we’ll need to trot out a kernel debugger.  Run !ndiskd.miniport -wmi <miniporthandle> to see all the WMI classes the miniport provides.

0: kd> !ndiskd.miniport ffffe000be3761a0 -wmiWMI

    f4a8027a-23b7-11d1-9ed9-00a0c9010057   OID 0xffa0c90a
                       OID, ARRAY, CUSTOM
    GUID_NDIS_ENUMERATE_ADAPTER            [N/A]
                       READ, NOT_SETTABLE, NDIS_ONLY
    GUID_NDIS_NOTIFY_ADAPTER_REMOVAL       [N/A]
                       STATUS, EVENT_ENABLED, NOT_SETTABLE, NDIS_ONLY
    GUID_NDIS_GEN_LINK_SPEED               OID_GEN_LINK_SPEED
                       OID, READ, NOT_SETTABLE
    GUID_NDIS_GEN_VENDOR_ID                OID_GEN_VENDOR_ID
                       OID, READ, NOT_SETTABLE

In practice, a miniport will have hundreds of GUIDs; the excerpt above highlights just a few of the types of WMI classes you might find.  A class marked with the OID flag is (unsurprisingly) translated to an OID.  In the excerpt above, you can see that my miniport supports OID_GEN_VENDOR_ID, as well as a vendor-custom OID 0xffa0c80a.  (The miniport also supports the WMI event GUID_NDIS_NOTIFY_ADAPTER_REMOVAL.)

Let’s suppose we want to find the corresponding WMI class for that vendor-private GUID/OID.  It’s little surprise that PowerShell can do it in a hurry.  Just drop in the GUID that you got from !ndiskd.  (If !ndiskd hid the GUID behind its friendly name, as it did for GUID_NDIS_GEN_VENDOR_ID, you can unmask it by running !ndiskd.help GUID_NDIS_GEN_VENDOR_ID.)

Get-WmiObject -Namespace root\wmi -List  |
    Where-Object {$_.Qualifiers['guid'].Value -eq '{f4a8027a-23b7-11d1-9ed9-00a0c9010057}' }

If you happen to have that vendor’s NIC driver installed, you'll see their WMI class pop up.  If not, well, try again with one of the system-provided GUIDs.

Now we’ve seen three ways to map WMI classes to OIDs:

  • Guess a likely class name and search for it with PowerShell
  • Search for the OID in ndisguid.h, then find the matching GUID in wmicore.mof
  • Use !ndiskd.miniport -wmi to find all the GUIDs that are available on a particular miniport

I’m out of space for today, so I’ll have to save my proof of Goldbach’s conjecture for next week.

Making minidumps more useful

$
0
0

Miniport: meet minidump

Minidumps are a small (~100kb) record of a crash.  As their name suggests, they’re optimized for small size… at the expense of usefulness.  Minidumps include just enough information to see the stack of the faulting thread, but they don’t generally have other threads or most of kernel pool.  If someone brings me with a minidump, the first thing I ask is “um, do you have anything better?”.

But that doesn’t mean that minidumps are completely useless.  And with a little care, minidumps with your NDIS miniport can be just a little more useful.  Here’s the trick.

When the system bugchecks, NDIS attempts to detect whether the bugcheck was caused by the network stack.  If so, NDIS tries to determine which network driver is at fault.  If NDIS determines that your miniport is at fault, NDIS will add some extra information to the minidump: enough data for !ndiskd.miniport to (mostly) work, and also a small chunk of your MiniportAdapterContext.  In Windows 7 through Windows 8.1, NDIS will save the first 1024 * sizeof(void*) bytes of your MiniportAdapterContext.  In other words,

Architecture Context bytes saved
x64 8096
x86 4096
arm 4096

This is useful to know, because it helps you lay out your context block.  You’ll find that, if you put your most important state into the first few kilobytes of your context block, then you’ll have an easier time debugging minidumps.

Happy debugging!

Thanks for your help making Windows great!

$
0
0

Really, it’s all about self-interest

Remember way back when you first set up your new computer?  Windows probably prompted you to “join the Customer Experience Improvement Program”.  For those of you who elected to join this program: thanks!  Without any extra effort on your part, you are helping us design a better product.

But with an unwieldy name like “Customer Experience Improvement Program (CEIP)”, it’s probably hard to imagine what, exactly, goes on behind-the-scenes.  Let’s take a look at how your votes help shape a core Windows component like NDIS.

NDIS is the system component that manages your network hardware and low-level network drivers.  NDIS contains a number of APIs that network drivers can use to interact with the operating system.  But not all APIs are created equal: some APIs were added years ago, and don’t really make much sense today.  Yet these obsolete APIs still periodic require work from the NDIS team to maintain them.  Is all that work worth it?

Now we know.  Included with the CEIP in Windows 8.1, NDIS tracks usage of several of our most impressively-obsolete APIs.  And the poll numbers are in: there are several obsolete APIs that are not used by a single computer in the world.  For example, NdisMSetMiniportSecondary is no longer used by anybody.  You can be sure that we’ll remove these unused APIs, so that Windows is smaller and more efficient.  So we win (less maintenance work) and you win (smaller, leaner operating system).  Although joining the CEIP helps Microsoft, it also helps you, because your feedback ensures you’ll get better products in the future.  Really, it’s all about self-interest.  😀

When we talk about a topic like this, it’s always important to ask the question: if Windows sends information back to Microsoft, what about my privacy?  Here’s the official answer.  My unofficial summary: we won’t spam or call you — in fact, we can’t, since we proactively work to avoid accidentally collecting your email or phone number.  Every single type of data that is collected is reviewed by a team with privacy experts, to make sure we only collect boring engineering stuff (like whether drivers use obsolete NDIS APIs).  CEIP is about your computer, not you.

In closing, here’s one more thought to mull over.  Network adapters can calculate IPv4 checksums in hardware, which saves CPU when processing IPv4 packets.  This is a common feature: about 86% of Windows 8.1 users have Ethernet NICs that support TCPv4 checksum offload.  But IPv6 has been growing in popularity, and it has nearly closed the hardware gap: we’re up to 78% of customers who can calculate TCPv6 offloads in hardware.  Maybe 2014 will finally be the year of widespread IPv6 rollout.  😉


Why is there a redundant Restart-NetAdapter cmdlet?

$
0
0

Sometimes you can’t just Enable your way out of a Disable mess

Windows 8 and Windows Server 2012 include a whole set of new PowerShell cmdlets to manage the network stack.  These cmdlets include Enable-NetAdapter and Disable-NetAdapter.  Believe it or not, those two cmdlets let you enable and disable your network adapters, respectively.  Want to shut off networking in a hurry?

PS C:> Disable-NetAdapter *

If that were all you needed to know, these cmdlets would be so obvious, they wouldn’t even be worth writing about.  As you might have guessed from the length of this page, there are actually a few surprises lurking in the void between Enable and Disable.

Surprise 1: You can’t call Enable immediately after Disable

Suppose you want to set a few advanced properties on the NIC named “Ethernet”.  You might write this script:

Set-NetAdapterAdvancedProperty Ethernet -DisplayName ‘Flow Control’ -DisplayValue Disabled

Set-NetAdapterAdvancedProperty Ethernet –DisplayName ‘Jumbo Packet’ -DisplayValue Disabled

That’ll work just fine, but it’s a little klunky.  By default, the Set-NetAdapterAdvancedProperty cmdlet restarts the NIC, so the new value takes effect immediately.  But when you set 2 or more properties in a row, the NIC goes through unnecessary restarts.  Recognizing this inefficiency, you might try a new version of the script that coalesces all the restarts to the end:

Set-NetAdapterAdvancedProperty Ethernet -NoRestart -DisplayName ‘Flow Control’ -DisplayValue Disabled

Set-NetAdapterAdvancedProperty Ethernet -NoRestart –DisplayName ‘Jumbo Packet’ -DisplayValue Disabled

Disable-NetAdapter Ethernet

Enable-NetAdapter Ethernet

But now you find that that script fails while trying to re-enable the adapter.  What gives?

It turns out that the Disable-NetAdapter cmdlet is asynchronous.  It initiates the disable operation, then returns immediately back to the script, before the adapter is fully disabled.  When the script then tries to call Enable-NetAdapter on the NIC, the Enable cmdlet fails, because the adapter isn’t fully disabled yet.

While you could work around the race by inserting Start-Sleep, there’s a better way: Restart-NetAdapter.  The Restart-NetAdapter cmdlet combines a Disable and an Enable into a single operation.  Restart-NetAdapter ensures that the Enable operation happens as soon as possible, but no sooner.

So Restart-NetAdapter is better than just a script that calls Disable-NetAdapter + Enable-NetAdapter.  Really, the whole is greater than the sum of its parts.

Surprise 2: Wildcard matching doesn’t always work

We saw earlier how a grumpy administrator might try to disable all the NICs on the system with Disable-NetAdapter *.  It seems logical that the proper way to re-enable your NICs is to run Enable-NetAdapter *.  But there’s a subtlety here: the * wildcard doesn’t always match the same set of NICs in both commands.  Let’s see what happens if you use NIC Teaming to create a team, then try to disable and re-enable all the NICs.

Initially, both the physical NIC and the Team Interface are enabled:

Team Interface Physical NIC Team Then we run Disable-NetAdapter *, and Windows evaluates the wildcard to both adapters.  Both get disabled:

Team Interface Physical NIC Team But when Microsoft NIC Teaming detects that all member NICs have been disabled, NIC Teaming will remove the entire team:

(Gone) Physical NIC Team Now when you run Enable-NetAdapter *, the wildcard matches all adapters: but only one adapter exists!  The cmdlet only enables the physical NIC:

(Gone) Physical NIC Team Finally, NIC Teaming notices that one of its member NICs has returned, so NIC Teaming restores the Team Interface(s).  But remember, the last thing we did to the team interface was disable it, so the team interface comes up in a disabled state:

Team Interface Physical NIC Team So as you can see, Enable-NetAdapter * does not completely undo the effects of Disable-NetAdapter *.  What, then, is a good way to do this?  Restart-NetAdapter to the rescue, again.  When you run Restart-NetAdapter *, the wildcard is only evaluated once, so it includes the Team Interface, before the Team Interface is removed.

Bonus surprise: Remote desktop facepalm

Disable-NetAdapter is dangerous when you’re logged in remotely, because if you disable the NIC you were using to connect, you won’t be able to log back in again to re-enable the NIC.  You know that, I know that, everybody knows that.  But we still all make this mistake sooner or later.  (My favorite variant of this story involves a remote kernel debugger and wayward flow control PAUSE frames taking the local network offline, killing the connection to the kernel debugger….)

Anyway, if you have to bounce the NICs, the safest solution is once again to use Restart-NetAdapter.  That’ll still kill your remote connection, but hopefully the NIC will come back up and the connection will be restored automatically.

Kernel debugging over the network

$
0
0

What just happened to my NIC?!We’ve previously published some tips on how to use the debugger to fix your NDIS miniport driver.  But today we’re going to turn the tables and talk about how the debugger uses NDIS to break your miniport driver.

You can debug the Windows kernel through several transports.  One of the fastest transports is the NET transport, which works over common network cards connected by an ordinary Ethernet cable.  Most developers are happy to run through the setup checklist and just begin debugging.  But for those of you who work on network drivers, you might notice a few strange things happening to the network stack when NET debugging is enabled.  What’s going on?

First, let’s take a look at how the debugger works when it’s not using networking.  This diagram shows the network stack behaving normally.

NIC hardware NIC hardware Miniport driver NDIS The NT kernel Debug transport 1394 hardware

The miniport driver talks to its hardware, and the debug transport is preoccupied with some other piece of hardware.  Everything is normal.  But now when we enable kernel debugging over the NIC, let’s see how the diagram changes.

NIC hardware NIC hardware Debug transport Miniport driver NDIS The NT kernel ?

In the second diagram, the debug transport has taken over the NIC hardware completely!  The miniport driver, NDIS, and even the kernel itself are excluded from talking to the NIC hardware.  That’s why, when kernel debugging is enabled, you see the NIC device getting “banged out” in Device Manager.  At boot, NDIS tried to initialize the miniport, but NDIS saw that the kernel debugger had exclusive access to the hardware, so NDIS failed the miniport’s AddDevice.  In other words, the yellow exclamation mark is by design: the vendor NIC “failed” to load because the kernel debugging is enabled.

If that were all to the story, then we’d have working kernel debugging, but completely broken networking.  Fortunately, there’s more.

When the debugger team at Microsoft planned this feature, they realized that network debugging wouldn’t be very useful if it broke your networking.  So they also planned a second piece to the architecture.  The debug transport exposes a virtual network adapter, based off of the underlying physical adapter.  This virtual adapter has a regular NDIS driver, so the rest of the OS can talk on the network.

NIC hardware NIC hardware Debug transport Miniport driver NDIS The NT kernel ? Miniport driver NDIS The NT kernel NIC hardware Virtual NIC

This virtual NIC is named the “Microsoft Kernel Debug Network Adapter”.  It is pre-installed on all Windows 8 machines, but it is only enabled once kernel debugging over networking is enabled.  When it’s enabled, it carries your normal traffic (TCPIP, etc.) out to the physical NIC that is now being used for kernel debugging.

So now you can explain why there’s that additional network device hidden in Device Manager: it’s waiting for the moment that you enable kernel debugging, so it can be your main network adapter.  You can also explain why the Network Connections folder shows a different adapter when kernel debugging is enabled: TCPIP really is bound to a different adapter (as far as the OS knows).

NdisFRegisterFilterDriver fails… now what?

$
0
0

Decoding the error codes

“I compiled my NDIS filter driver, but NdisFRegisterFilterDriver fails in my DriverEntry function.  Now what?”

Here’s a table listing common problems and fixes.  Rows are grouped by symptom.

Problem Resolution
NDIS_STATUS_BAD_CHARACTERISTICS (0xc0010005)
The Characteristics block has the wrong Header for the NDIS driver version. If you are writing an NDIS 6.0 filter driver, set NDIS_FILTER_DRIVER_CHARACTERISTICS::Header::Revision to NDIS_FILTER_CHARACTERISTICS_REVISION_1.  Otherwise, for any version NDIS 6.1 or later, use NDIS_FILTER_CHARACTERISTICS_REVISION_2.
The Characteristics block has an incorrect Size or Type.

Make sure that NDIS_FILTER_DRIVER_CHARACTERISTICS::Header.Size is set to sizeof(NDIS_FILTER_DRIVER_CHARACTERISTICS).

Make sure that NDIS_FILTER_DRIVER_CHARACTERISTICS::Header.Type is set to NDIS_OBJECT_TYPE_FILTER_DRIVER_CHARACTERISTICS.

Your filter is missing the 4 mandatory handlers. Make sure you provide FilterPause, FilterRestart, FilterAttach, and FilterDetach handlers in your characteristics block.
Your filter has inconsistent OID handlers.

If you provide FilterOidRequest, you must provide FilterOidRequestComplete.  Likewise, if you provide FilterOidRequestComplete, you must provide FilterOidRequest.

Likewise, do not provide a FilterCancelOidRequest handler if you do not have a FilterOidRequest handler.  However, the inverse is not true: if you provide FilterOidRequest, you may optionally provide FilterCancelOidRequest.  I only recommend you provide it if you queue OIDs for some nontrivial amount of time.

Both rules above also apply to Direct OID handlers.

Your filter has inconsistent NBL handlers. If you do not provide a FilterSendNetBufferLists handler, then you must not provide a FilterCanceSendNetBufferLists handler either.
NDIS_STATUS_FAILURE (0xc0000001)
The filter doesn’t match the INF. Check that NDIS_FILTER_DRIVER_CHARACTERISTICS::UniqueName:
  1. is a GUID,
  2. has curly braces, and
  3. precisely matches what’s in your INF’s NetcfgInstanceId directive.
The filter was not correctly installed. Verify that this key exists:

HKLMSYSTEMCurrentControlSetControlNetwork{4d36e974-e325-11ce-bfc1-08002be10318}{your filter’s GUID}Ndi

If the key doesn’t exist, check whether your installer actually ran, and whether its call to INetCfgClassSetup::Install succeeded.

NDIS_STATUS_BAD_VERSION (0xc0010004)
The current implementation of NDIS doesn’t support the NDIS contract version that your filter driver is asking for.

Check the list of NDIS versions by OS version against the NDIS_FILTER_DRIVER_CHARACTERISTICS::MajorNdisVersion and ::MinorNdisVersion that your driver is requesting.  Make sure that you’re getting the right version.

In particular, note that there is no such thing as “NDIS 6.2”.  If you’re setting MajorNdisVersion=6 and MinorNdisVersion=2, you’ll get this error code.  Use MinorNdisVersion=20 instead.

Some other error code
Your filter driver’s FilterSetOptions handler was called and returned a failure code.

NDIS calls into your FilterSetOptions handler within the context of your filter’s call to NdisFRegisterFilterDriver.  If your FilterSetOptions handler returns an error code, then NdisFRegisterFilterDriver will fail with the same error code.

Set a breakpoint on your filter’s FilterSetOptions handler and verify that it succeeds.

A hang, crash, or other strange behavior
Your filter driver is using a duplicate UniqueName. Make sure that no other filter uses the same UniqueName as your filter.  In particular, ensure that you are not using the same UniqueName as the WDK sample filter: {5cbf81bd-5055-47cd-9055-a76b2b4e3697}
Another network driver hangs when trying to attach your filter.

Use a kernel debugger to run:

!stacks 2 ndis!

Check whether there are any tall callstacks in NDIS that appear to be pausing or unbinding any drivers.  It’s likely that some driver has hung.  This hang may not be caused directly by your filter driver; it may only be exposed while NDIS tried to attach your filter driver.

When all else fails, you can often get more detail by looking at NDIS trace messages.  If you spend a lot of time working on NDIS drivers, I encourage you to get comfortable with reading these traces, since that ability will save you time in the long run.  In this case, NdisFRegisterFilterDriver will often trace a message saying exactly what has gone wrong.

Using the checked version of NDIS.SYS

$
0
0

I assert that this is a good way to find bugs

Installing the checked version of the operating system is an effective technique to quickly find bugs in your network driver.  If you’re not familiar with checked builds (and even if you are), you should read the excellent documentation here.  Seriously, read it; I won’t repeat it here.

What do you get with the checked build of NDIS?

The main difference is that NDIS’s implementation has (as of Windows 8.1) approximately 2200 extra asserts.  While some of these asserts verify NDIS’s internal bookkeeping is consistent, many of them verify that your driver uses NDIS’s APIs correctly.  For example, NDIS asserts the current IRQL is correct when each MiniportXxx callback returns, to help catch the class of bug where your miniport driver leaks an IRQLs or spinlock.

Prior to Windows 7, using the checked build of NDIS is also the only way to see NDIS’s debug traces.  But as of Windows 7, these traces are now available from WPP, so there’s no longer a need to use the checked build solely for tracing.

What’s the downside of using a checked build?

There are two downsides to using checked builds: performance and false-positives.

Checked builds are noticeably slower.  But they aren’t as bad as you might think.  We still compile checked builds with most compiler optimizations enabled, so the only slowdowns are a few extra verifications here and there.  Still, the operating system has zillions of assertions, so those do add up.  You definitely don’t want to use a checked build for any performance-related work.  But they’re just fine for initial work on a new feature or functional testing.

False positives are also a problem.  Sometimes you’ll see assertions that fail for reasons that don’t seem to be related to your driver.  When you see an unfamiliar assertion, you’ll first want to spent a moment to convince yourself that the assertion failure is really caused by your driver.  For example, if there’s an assertion in win32k.sys about an invalid HRGN, that’s probably not caused by any network driver.  Prior to Windows 8, the operating system was kind of “noisy”; a nontrivial percentage of its assertions would fire for benign reasons.  We worked hard to clean that up in Windows 8, so the asserts have a better signal-to-noise ratio.  (Like many Windows engineers, I used a checked build of the OS as my primary workstation for some time during Windows 8 development.  That was fun.)

If you discover an assertion in NDIS.SYS that you believe is a false positive, please let me know here and I’ll try to clean that up.  (Unfortunately I’m not knowledgeable about non-networking drivers, so I can’t promise I can help you with any random assertion that you come across.)

How can you get the checked build of NDIS?

MSDN has the story on how to download a copy of the checked build.  From there, you have two options:

Since MSDN already explains the first option, I won’t repeat those instructions here.  Let’s talk about the second option: how to selectively replace a few drivers.

First, identify the drivers that you want to replace.  Here’s a table of drivers you can consider replacing:

Driver When to replace
The kernel & HAL Always
NDIS.SYS Always
TCPIP.SYS Miniport, LWF, and WFP callout drivers
NETIO.SYS (Windows Vista and later) Whenever TCPIP.SYS is replaced
FWPKCLNT.SYS WFP callout drivers
(your bus driver, e.g., PCI.SYS) Miniport drivers
NWIFI.SYS Native 802.11 drivers
VWIFIBUS.SYS
VWIFIFLT.SYS
VWIFIMP.SYS
Native 802.11 drivers that implement MAC virtualization (WFD or SoftAP)
NDISUIO.SYS WWAN drivers
WMBCLASS.SYS WWAN drivers that implement the class driver model
VSWITCH.SYS Hyper-V extensible switch extension

Keep in mind — these are just guidelines.  You are not required to test with any particular set of drivers, and you might want to fine-tune the list depending on what subsystem you’re targeting.  If you are unsure about which binaries to replace, remember you can always just install the entire checked OS, which gives you the maximum checked build coverage.

Now that you know which drivers to replace, you can extract them from the checked build media.  If you obtained installable media, you can mount the included INSTALL.WIM with DISM.EXE to get at the individual drivers, or you can just install the OS into a throw-away VM to get convenient access to its drivers.

Finally, you’ll need to actually replace these drivers on your target OS.  Don’t do this on a production OS machine; we can’t officially support this.  The easiest way to replace binaries is to hook up a kernel debugger and use the .kdfiles feature.  For example, here’s the mapfile that I use to replace NDIS.SYS on a test machine:

map
\Windows\system32\DRIVERS\NDIS.SYS
c:\path\to\ndis.sys

Note that the name of the driver will depend on how the driver is loaded.  Use CTRL+D or CTRL+ALT+D in the debugger and reboot the target machine to see the official name of each driver.

Note that the process for replacing the kernel & HAL is special.

Oh, and sorry for the awful pun in the subtitle.

The NDIS API naming convention

$
0
0

NdisFWhat?  Your secret decoder ring to NDIS functions

The first time you come across NDIS, you might find yourself lost in the enormous number of NDIS APIs, OIDs, status codes, and data structures.  What’s the difference between NdisMIndicateStatus and NdisFIndicateStatus?  Fortunately, NDIS has naming conventions that make it a little easier to organize the APIs.  Let’s take a look.

NDIS.SYS exports approximately 500 APIs on Windows 8.1.  (Yes, five hundred!)  Of those, all but three start with the name “Ndis”.  So that’s our first convention: NDIS APIs are almost always named NdisSomething.  That might seem obvious, but there’s a very subtle piece of information encoded in that name: NDIS’s internal routines are prefixed with “ndis”.  See the difference?  Upper-case N means it’s an exported API, and lower-case n means it’s an internal API.

Why should you care about NDIS’s internal routines?  Well most of the time, you don’t.  But when you start debugging your driver, you’ll inevitably be looking at some callstacks with NDIS routines (both exported and internal).  If you don’t know what the callstack is doing, you might get some clues by looking at related MSDN documentation.  Try looking at this callstack and see if you can guess which API is most likely to be documented on MSDN:

00 tcpip!FlReceiveNetBufferListChain
01 ndis!ndisMIndicateNetBufferListsToOpen
02 ndis!ndisIndicateSortedNetBufferLists
03 ndis!ndisMDispatchReceiveNetBufferListsInternal
04 ndis!ndisMTopReceiveNetBufferLists
05 ndis!ndisCallReceiveHandler
06 ndis!ndisIterativeDPInvokeHandlerOnTracker
07 ndis!ndisInvokeNextReceiveHandler
08 ndis!ndisMIndicateReceiveNetBufferListsInternal
09 ndis!NdisMIndicateReceiveNetBufferLists
0a netvsc!ReceivePacketMessage

That’s not all; it takes more than an N to spell Naming convention.  Following the “Ndis” prefix there is often a code indicating which type of driver is associated with the API.  Here’s a reference table of the codes:

Code Purpose
M Miniport drivers
F Filter drivers
IM Intermediate drivers
If Network interface providers
Co CoNDIS drivers
Cl CoNDIS call clients
Cm CoNDIS call managers
MCo CoNDIS miniport drivers
MCm CoNDIS integrated call managers

Functions that don’t have a code are meant to be called from multiple types of drivers, or from protocol drivers.  (For some reason, protocol drivers never earned their own code.  I’m not sure why, but who am I to argue with two decades of tradition?)

So now without even consulting the documentation, you can guess that NdisMSetTimer is for miniport drivers, while NdisIMNotifyPnPEvent is for intermediate drivers.  And I bet you can answer the question at the start of this article: what is the difference between NdisMIndicateStatus and NdisFIndicateStatus?

Enough trivia games; let’s return to practical applications of this knowledge.  Suppose you are debugging a callstack that looks like this

00 DriverY+0x22af
01 ndis!ndisMSendNBLToMiniport+0xb1
02 ndis!NdisFSendNetBufferLists+0x64
03 DriverX+0x1b6d
04 ndis!NdisFSendNetBufferLists+0x64
05 pacer!PcFilterSendNetBufferLists+0x9d
06 ndis!ndisSendNBLToFilter+0x69
07 ndis!NdisSendNetBufferLists+0x85
08 tcpip!FlpSendPacketsHelper+0x675

Although you might not be familiar with DriverX or DriverY, you can use NDIS naming conventions to guess their roles.  At frame 03, DriverX calls an NdisFXxx function in frame 02.  Therefore, DriverX is a filter driver.  You can also see an ndisMXxx internal NDIS function at frame 01, so you know that NDIS has finished calling all the filters, and has moved on to the miniport before calling DriverY in frame 00.  So it’s reasonable to conclude that DriverY is a miniport driver.

Another very practical application: knowing the codes will help you narrow down the APIs that you need to worry about when you’re writing code.  If you’re writing a lightweight filter driver, you can safely ignore the approximately 130 APIs that are exclusively for miniports (NdisMSomething).  That makes it a bit easier to find the API you’re looking for.

Now you know the ABC’s of NDIS!

Viewing all 48 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>