Dynamic Code Generation in .NET
I recently had the need/desire to dynamically generate some code in .NET. It's not something you tend to do a lot unless you are creating a compiler, so it was an interesting experience. In the end it didn't solve my problem, but it was educational none-the-less.
First I'll describe the scenario of the problem I was trying to solve using dynamically generated code, then I'll discuss how I went about generating the code and problems that hit and so on.
The problem has to do with assembly resolution in .NET. I wanted to launch a program B in a separate appdomain from program A. Program B is in a completely different directory than A. Program B ultimately wants to load assemblies that are not in the GAC and are not in the same directory as it or in any of it's sub-directories. One way around this problem is to play games with probing paths and .config files. That works but it doesn't work if the assemblies are on a different drive. What I decided I wanted to do was to inject an AssemblyResolve handler into program B and provide a search path from program A. I could have done this by writing the AssemblyResolve handler in C#, creating an assembly and then loading that assembly into program B and wiring up the handler. It would be cleaner (but a more work) to dynamically generate the assembly as needed. For entertainment value I went with the latter plan.
First, lets set up some test programs so we can establish the scenario. We need a test assembly that's just going to be the fodder for our experiment:
using System;
public class TestClass
{
public void TestMethod()
{
Console.WriteLine("In TestClass.TestMethod");
}
}
Now program B, TestLoader, which is basically going to take an assembly name on the command line, attempt to load it and create an instance of TestClass and call TestMethod.
using System;
using System.IO;
using System.Reflection;
public class Program
{
#if ASSEMBLY_RESOLVE
public static Assembly OnAssemblyResolve(object sender, ResolveEventArgs args)
{
Console.WriteLine("Resolving assembly ");
Console.WriteLine(args.Name);
string[] searchDirs = ((string)AppDomain.CurrentDomain.GetData("SEARCHPATH")).Split(new char[] { ';' }, StringSplitOptions.RemoveEmptyEntries);
int index = args.Name.IndexOf(',');
if (index == -1)
index = args.Name.Length;
string name = args.Name.Substring(0, index);
for (int i = 0; i < searchDirs.Length; i++)
{
string assemblyPath = Path.Combine(searchDirs[i], name) + ".dll";
if (File.Exists(assemblyPath))
return Assembly.LoadFrom(assemblyPath);
}
return null;
}
#endif
public static int Main(string[] args)
{
#if ASSEMBLY_RESOLVE
if (args.Length != 2)
{
Console.WriteLine("TestLoader <assembly-name> <path>");
return 1;
}
AppDomain.CurrentDomain.SetData("SEARCHPATH", args[1]);
AppDomain.CurrentDomain.AssemblyResolve += OnAssemblyResolve;
#else
if (args.Length != 1)
{
Console.WriteLine("TestLoader <assembly-name>");
return 1;
}
#endif
try
{
Assembly assembly = Assembly.Load(args[0]);
Type type = assembly.GetType("TestClass");
MethodInfo methodInfo = type.GetMethod("TestMethod");
object obj = Activator.CreateInstance(type);
methodInfo.Invoke(obj, null);
}
catch (Exception e)
{
Console.WriteLine(e);
}
return 0;
}
}
So what's going on here then? Firstly, ignore the code inside the ASSEMBLY_RESOLVE macro. What this code does without that is to try to load the assembly name given on the path. It's expecting a full or partially qualified strong name, not a file path.
Compile the test assembly code into a .dll in one directory call TestAssembly. Then compile program B in another directory call TestLoader as a .exe. Run it specifying TestAssembly as an argument. Of course the Load() generates an exception. Let's assume we decide to solve this problem by adding an AssemblyResolve handler. Recompile program B with /d:ASSEMBLY_RESOLVE defined. Now run it specifying TestAssembly ..\TestAssembly as arguments. You can see by examining the code and looking at the output how the AssemblyResolve event handler saves the day when the CLR cannot find the assembly using its normal probing locations (GAC, local directory, special sub-directories). Before we continue, recompile program B without ASSEMBLY_RESOLVE defined.
Now we move on to program A, TestLoaderWrapper, the wrapper that will dynamically "inject" the assembly resolve event handler. Imagine that TestLoader was some code that you did not have the source code for. We're going to try change it's assembly loading behavior externally.
Here's the code for TestLoaderWrapper. It's a bit scary:
using System;
using System.IO;
using System.Reflection;
using System.Reflection.Emit;
public class Program
{
public static int Main(string[] args)
{
if (args.Length != 2)
{
Console.WriteLine("LoadTestWrapper <assembly-name> <search-path>");
return 1;
}
AppDomain appDomain = AppDomain.CreateDomain("NewDomain");
// Set the search path in the new domain using domain data
appDomain.SetData("SEARCHPATH", args[1]);
// Dynamically create an assembly containing our assembly resolver
AssemblyName assemblyName = new AssemblyName();
assemblyName.Name = "AssemblyResolver";
string assemblyFile = assemblyName.Name + ".dll";
AssemblyBuilder assemblyBuilder = AppDomain.CurrentDomain.DefineDynamicAssembly(assemblyName, AssemblyBuilderAccess.RunAndSave);
ModuleBuilder moduleBuilder = assemblyBuilder.DefineDynamicModule(assemblyName.Name, assemblyFile);
TypeBuilder typeBuilder = moduleBuilder.DefineType("Handler", TypeAttributes.Class | TypeAttributes.Public);
MethodBuilder methodBuilder = typeBuilder.DefineMethod("OnAssemblyResolve",
MethodAttributes.Public | MethodAttributes.Static,
typeof(Assembly), new Type[] { typeof(object), typeof(System.ResolveEventArgs) });
GenerateAssemblyResolveMethod(methodBuilder.GetILGenerator());
Type handlerType = typeBuilder.CreateType();
MethodInfo onAssemblyResolve = handlerType.GetMethod("OnAssemblyResolve");
// You cannot specify a path in the saved assembly, so we must copy it next to the assembly
// we will be running so that the CLR loader can find it.
string targetAssemblyPath = Path.Combine(@"..\TestLoader", assemblyFile);
assemblyBuilder.Save(assemblyFile);
File.Copy(assemblyFile, targetAssemblyPath, true);
Delegate handlerDelegate = Delegate.CreateDelegate(typeof(ResolveEventHandler), onAssemblyResolve);
EventInfo assemblyResolveEvent = typeof(AppDomain).GetEvent("AssemblyResolve");
assemblyResolveEvent.AddEventHandler(appDomain, handlerDelegate);
appDomain.ExecuteAssembly(@"..\TestLoader\TestLoader.exe", AppDomain.CurrentDomain.Evidence,
new string[] { args[0] });
return 0;
}
public static void GenerateAssemblyResolveMethod(ILGenerator il)
{
Type consoleType = typeof(System.Console);
MethodInfo write = consoleType.GetMethod("Write", BindingFlags.Public | BindingFlags.Static, null, new Type[] {typeof(string)}, null);
MethodInfo writeLine = consoleType.GetMethod("WriteLine", BindingFlags.Public | BindingFlags.Static, null, new Type[] {typeof(string)}, null);
Type appDomainType = typeof(System.AppDomain);
MethodInfo getCurrentDomain = appDomainType.GetMethod("get_CurrentDomain");
MethodInfo getData = appDomainType.GetMethod("GetData");
Type assemblyType = typeof(System.Reflection.Assembly);
MethodInfo loadFrom = assemblyType.GetMethod("LoadFrom", new Type[] { typeof(string) });
Type resolveEventArgs = typeof(System.ResolveEventArgs);
MethodInfo getName = resolveEventArgs.GetMethod("get_Name");
Type stringType = typeof(string);
MethodInfo splitMethod = stringType.GetMethod("Split", new Type[] { typeof(char[]), typeof(StringSplitOptions) });
MethodInfo substring = stringType.GetMethod("Substring", new Type[] { typeof(int), typeof(int) });
MethodInfo indexOf = stringType.GetMethod("IndexOf", new Type[] { typeof(char) });
MethodInfo concat = stringType.GetMethod("Concat", new Type[] { typeof(string), typeof(string) });
MethodInfo getLength = stringType.GetMethod("get_Length");
Type pathType = typeof(System.IO.Path);
MethodInfo combine = pathType.GetMethod("Combine", BindingFlags.Public | BindingFlags.Static);
Type fileType = typeof(System.IO.File);
MethodInfo exists = fileType.GetMethod("Exists", BindingFlags.Public | BindingFlags.Static);
// Console.Write("Resolving assembly ");
il.Emit(OpCodes.Ldstr, "Resolving assembly "); // -> T
il.EmitCall(OpCodes.Call, write, null); // ->
// Console.WriteLine(args.Name);
il.Emit(OpCodes.Ldarg, 1); // -> T
il.EmitCall(OpCodes.Callvirt, getName, null); // -> T
il.EmitCall(OpCodes.Call, writeLine, null); // ->
// string[] searchDirs = ((string)AppDomain.CurrentDomain.GetData("SEARCHPATH")).Split(new char[] {';'}, StringSplitOptions.RemoveEmptyEntries);
LocalBuilder searchDirs = il.DeclareLocal(typeof(string[]));
il.EmitCall(OpCodes.Call, getCurrentDomain, null); // -> T
il.Emit(OpCodes.Ldstr, "SEARCHPATH"); // -> T, T
il.EmitCall(OpCodes.Callvirt, getData, null); // -> T
il.Emit(OpCodes.Castclass, typeof(String)); // -> T
il.Emit(OpCodes.Ldc_I4, 1); // -> T, I4
il.Emit(OpCodes.Newarr, typeof(Char)); // -> T
il.Emit(OpCodes.Dup); // -> T, T
il.Emit(OpCodes.Ldc_I4, 0); // -> T, T, I4
il.Emit(OpCodes.Ldc_I4, 0x3b); // -> T, T, I4, I4
il.Emit(OpCodes.Stelem_I2); // -> T
il.Emit(OpCodes.Ldc_I4, 1); // -> T, I4
il.EmitCall(OpCodes.Callvirt, splitMethod, null); // -> T
il.Emit(OpCodes.Stloc, searchDirs); // ->
// int index = args.Name.IndexOf(',');
LocalBuilder index = il.DeclareLocal(typeof(int));
il.Emit(OpCodes.Ldarg, 1); // -> T
il.EmitCall(OpCodes.Callvirt, getName, null); // -> T
il.Emit(OpCodes.Ldc_I4, ','); // -> T, T
il.EmitCall(OpCodes.Callvirt, indexOf, null); // -> T
il.Emit(OpCodes.Stloc, index); // ->
// if (index == -1)
Label indexThen = il.DefineLabel();
il.Emit(OpCodes.Ldloc, index); // -> T
il.Emit(OpCodes.Ldc_I4_M1); // -> T
il.Emit(OpCodes.Ceq); // -> T
il.Emit(OpCodes.Brfalse, indexThen); // ->
// index = args.Name.Length;
il.Emit(OpCodes.Ldarg, 1); // -> T
il.EmitCall(OpCodes.Callvirt, getName, null); // -> T
il.EmitCall(OpCodes.Callvirt, getLength, null); // -> T
il.Emit(OpCodes.Stloc, index); // ->
il.MarkLabel(indexThen);
// string name = args.Name.Substring(0, index);
LocalBuilder name = il.DeclareLocal(typeof(string));
il.Emit(OpCodes.Ldarg, 1); // -> T
il.EmitCall(OpCodes.Callvirt, getName, null); // -> T
il.Emit(OpCodes.Ldc_I4, 0); // -> T, I4
il.Emit(OpCodes.Ldloc, index); // -> T, I4, T
il.EmitCall(OpCodes.Callvirt, substring, null); // -> T
il.Emit(OpCodes.Stloc, name); // ->
// for (int i = 0; ...) ...
LocalBuilder i = il.DeclareLocal(typeof(Int32));
Label loopStart = il.DefineLabel();
Label loopEnd = il.DefineLabel();
il.Emit(OpCodes.Ldc_I4, 0); // -> I4
il.Emit(OpCodes.Stloc, i); // ->
il.Emit(OpCodes.Ldloc, i); // -> I4
il.Emit(OpCodes.Br, loopEnd); // ->
// string assemblyPath = Path.Combine(searchDirs[i], name) + ".dll";
LocalBuilder assemblyPath = il.DeclareLocal(typeof(string));
Label ifEnd = il.DefineLabel();
il.MarkLabel(loopStart);
il.Emit(OpCodes.Ldloc, searchDirs); // -> T
il.Emit(OpCodes.Ldloc, i); // -> T, I4
il.Emit(OpCodes.Ldelem_Ref); // -> T
il.Emit(OpCodes.Ldloc, name); // -> T, T
il.EmitCall(OpCodes.Call, combine, null); // -> T
il.Emit(OpCodes.Ldstr, ".dll"); // -> T, T
il.EmitCall(OpCodes.Callvirt, concat, null); // -> T
il.Emit(OpCodes.Stloc, assemblyPath); // ->
// if (File.Exists(assemblyPath))
il.Emit(OpCodes.Ldloc, assemblyPath); // -> T
il.EmitCall(OpCodes.Call, exists, null); // -> B
il.Emit(OpCodes.Brfalse, ifEnd); // ->
// return Assembly.LoadFrom(assemblyPath);
il.Emit(OpCodes.Ldloc, assemblyPath); // -> T
il.EmitCall(OpCodes.Callvirt, loadFrom, null); // -> T
il.Emit(OpCodes.Ret); // ->
// for (...; i < searchDirs.Length; i++)
il.MarkLabel(ifEnd);
il.Emit(OpCodes.Ldloc, i); // -> I4
il.Emit(OpCodes.Ldc_I4, 1); // -> I4, I4
il.Emit(OpCodes.Add); // -> I4
il.Emit(OpCodes.Stloc, i); // ->
il.Emit(OpCodes.Ldloc, i); // -> I4
il.MarkLabel(loopEnd);
il.Emit(OpCodes.Ldloc, searchDirs); // -> I4, T
il.Emit(OpCodes.Ldlen); // -> I4, I
il.Emit(OpCodes.Conv_I4); // -> I4, I4
il.Emit(OpCodes.Clt); // -> B
il.Emit(OpCodes.Brtrue, loopStart); // ->
// return null;
il.Emit(OpCodes.Ldnull); // -> T
il.Emit(OpCodes.Ret); // ->
}
}
Yikes! That almost 200 lines of code! What this snippet shows you is that dynamic code generation in .NET is not cheap. Let's touch on the key points.
Stripping away the dynamic assembly generation, what this code wants to do is create a new appdomain and execute TestLoader in it Remember this is the test loader that would normally fail to load the test assembly.
We are going to take the test assembly and the search path in to TestLoaderWrapper and pass them to TestLoader. The test assembly is a command line argument, but the search path is passed through appdomain data, so we set that up with a call to SetData.
Now the heavy lifting. We start by creating an AssemblyBuilder, ModuleBuilder, TypeBuilder, MethodBuilder family of objects. It's just a lot of paying attention to parameters.
Then we call GenerateAssemblyResolveMethod to build the IL for the method. I won't bore you with a tedious breakdown; the ILGenerator methods are all very self explanatory and easy to follow. I do have some tips to share on how I went about generating the code and typing it in.
Firstly, .NET Reflector is your friend. The IL disassembly is a great way to get started with the correct IL. Simply open up the version of TestLoader that has ASSEMBLY_RESOLVE defined and start cribbing.
What you are going to find really quickly however is that the C#, VB .NET or other compiler usually generate very verbose code, that seemingly takes a lot more IL instructions than necessary to get the job done. What you have to remember is that IL is not meant to be executed directly, it's meant to be compiled (also known as the jitter). The jitter will usually do a good job of removing much of the IL redundancy. Actually, one technique for writing a jitter is to look for standard patterns of IL which can be optimized into native code sequence.
The other thing that the extra instructions help with is debugging, in particular the IL instruction boundaries where the IL stack hits zero entries are called sequence points and they typically map to a logical unit of source code that you can stop at in the debugger. You can safely examine program variables at that point because, for example, the program will not be halfway through evaluating a source code expression.
All this is irrelevant to us though because unless you do some extra work to generate .pdb information for your dynamic assembly you want be able to see much in the debugger. You can however insert a Break opcode into it, turn off Just My Code in VS and look at the native code that is generated for the method.
Getting back to the IL generation. I took the .NET Reflector output as a starting point and began munging it into maintainable code. As you can see grabbing methods and types to pass to IL instructions is just a question of calling Reflection API's. Notice that I comment the current stack after each instruction executes. T is a metadata token, I4 means a 4 byte integer. I break the IL generation down in to chunks where the IL stack returns to zero entries, for the reasons explained above and to help keep things readable.
What would really be awesome is if someone would write a .NET Reflector plug-in that would automatically generate Reflection Emit code for a method.
Almost there now. We can specify a different directory for the generated assembly. Don't know why, it just doesn't work, so we copy it next to the TestLoader executable. Why do we need to do this? Because the appbase (the root directory where the CLR will probe for assemblies) is the directory of the entry assembly for the appdomain, which is different by default for TestLoader than it is for TestLoaderWrapper because they are in different directories.
Then we create a delegate for the assembly resolve handler and attach it to the new AppDomain object and we are done.
I'll attach the source code for all this to the post for those who may stumble this way in search of help with this topic.