Designing an airline passenger reservation system

25 November 2008

Architecture - Linq-to-SQL and set-based operations: Delete statements

Filed under: Architecture — Tags: , , , , , , — Kristofer @ 21:42

A few weeks back I published a code sample showing an implementation of set-based update operations using Linq-to-SQL. I also promised a followup showing batch deletes, and “insert into…select from” so I guess it is time for a followup. This time I will cover set-based deletes.

To avoid unnecessary duplication the code in this article is an incremental update to the code in my previous article on the subject.

The original set-based update-statement implementation is here:
http://blog.huagati.com/res/index.php/2008/11/05/architecture-linq-to-sql-and-set-based-operations-update-statements/

On a semi-related note: a couple of days ago, Microsoft’s Bart de Smet (author of the B# blog) also recognized the need for language integrated set-based operations in his “Dude, where’s my LINQ DML” blog entry. Although his article left out the actual implementation - he’s instead elaborating on a generic interface to front this functionality. (Although I have to say that I think my implementation results in more clean end-user code than his suggested interface. I know I’m abusing lambdas for the “set”-part but I think it is clear enough anyway…).

Set-based delete statements

I’m not going to elaborate on the rationale for set-based deletes rather than a row-based approach. The reasons for set-based/batch deletes are exactly the same as set-based/batched updates, already covered in the previous article.

Similar to the update statements I will have a couple of overloaded functions allowing for some flexibility in how it is used. The simple form takes a type and a criteria lambda:


dc.Delete<Employee>(
    (where => where.City.StartsWith("XYZ"))
    );

The SQL generated by this statement is:

delete from dbo.Employees  where EmployeeID in (SELECT [t0].[EmployeeID]
FROM [dbo].[Employees] AS [t0]
WHERE [t0].[City] LIKE @p0)

Just like the update-statement version, we can use navigation properties, CLR functions etc to set up more complex expressions in a single lambda:


dc.Delete<Customer>(
    where => where.Orders.Max(
        od => od.OrderDate
        ) < DateTime.Now.AddYears(-5)
    && where.Country == "Ukraine"
    );

 …and because we’re using Linq-to-SQL to generate the SQL, the where clause lambda is translated into nice SQL. After concatenation with the delete part of the statement the line above translates into:

delete from dbo.Customers  where CustomerID in (SELECT [t0].[CustomerID]
FROM [dbo].[Customers] AS [t0]
WHERE (((
    SELECT MAX([t1].[OrderDate])
    FROM [dbo].[Orders] AS [t1]
    WHERE [t1].[CustomerID] = [t0].[CustomerID]
    )) < @p0) AND ([t0].[Country] = @p1))

 If we need a more complex filter than what a lambda expression on the existing L2S classes allow, or if we want a more readable lookup query as the base there’s also an overload that accept an IQueryable query:



IQueryable<Customer> customersOfKievRep =
    from cust in dc.Customers
    join ord in dc.Orders on cust.CustomerID equals ord.CustomerID
    join emp in dc.Employees on ord.EmployeeID equals emp.EmployeeID
    where emp.Region == "Kiev"
    select cust;
dc.Delete<Customer>(customersOfKievRep, LockModeEnum.Row); 

The last example also shows the lock mode parameter in use, so the generated SQL will be:

delete from dbo.Customers with (rowlock) where CustomerID in (SELECT [t0].[CustomerID]
FROM [dbo].[Customers] AS [t0]
INNER JOIN [dbo].[Orders] AS [t1] ON [t0].[CustomerID] = [t1].[CustomerID]
INNER JOIN [dbo].[Employees] AS [t2] ON [t1].[EmployeeID] = ([t2].[EmployeeID])
WHERE [t2].[Region] = @p0)

 

So. Enough usage examples, on to the implementation.

Implementation

Here’s the implementation. Add it to the code sample from the previous article and you’re ready to use it. As with the previous example, feel free to use it in your own L2S-based software. Remember that you use it at your own risk, don’t blame me if it breaks anything. If you want to translate/reuse any part with LLBLGen, subsonic, NHibernate etc that’s fine too. The only thing not allowed is to convert it for use with Entity Framework, or to re-publish in blog articles without prior written permission. :)


public static void Delete<T>(this DataContext dc, Expression<Func<T, bool>> selectionCriteria)
{
    IQueryable<T> baseQuery = ((IQueryable<T>)dc.GetTable(typeof(T))).Where(selectionCriteria);
    Delete<T>(dc, baseQuery, LockModeEnum.NotSpecified);
}

public static void Delete<T>(this DataContext dc, IQueryable<T> baseQuery, LockModeEnum lockMode)
{
    //get the type representing the table we're dealing with
    Type baseType = typeof(T);

    //get primary key members for the type we're dealing with
    List<MetaDataMember> primaryKeyMembers = GetPKMembers(dc, baseType);
    if (primaryKeyMembers.Count == 0)
    {
        throw new InvalidOperationException(baseType.Name + " has no primary key members.");
    }

    //create a new class containing only the PK members of the table we're going to update
    MemberBinding[] memberBindings = null;
    ParameterExpression[] baseTypeParams = new ParameterExpression[] { Expression.Parameter(baseType, "") };
    Type pkType = CreatePKClass(baseType, primaryKeyMembers, baseTypeParams[0], out memberBindings);

    //change the query to only get the PK members
    IQueryable pkQuery = GetPKQuery(baseQuery, pkType, baseTypeParams, memberBindings);

    //extract the select query for the primary key members
    DbCommand pkSelectCommand = dc.GetCommand(pkQuery);
    string whereClause = null;
    if (primaryKeyMembers.Count == 1)
    {
        whereClause = "where " + primaryKeyMembers[0].Name + " in (" + pkSelectCommand.CommandText + ")";
    }
    else
    {
        string ct = pkSelectCommand.CommandText;
        whereClause = ct.Substring(ct.IndexOf("\r\nfrom ", StringComparison.InvariantCultureIgnoreCase));
    }

    //get the table name
    string qualifiedTableName = dc.Mapping.GetTable(baseType).TableName;

    //concatenate the delete statement
    string deleteStatement = ReplaceSQLParameters(
        "delete from " + qualifiedTableName + " " + GetLockHint(lockMode)
        + " " + whereClause);

    //execute the delete
    dc.ExecuteCommand(deleteStatement, MergeParameters(pkSelectCommand.Parameters));
}

5 November 2008

Architecture - Linq-to-SQL and set-based operations: Update statements

Filed under: Architecture — Tags: , , , , , , — Kristofer @ 15:59

Data manipulation with OR mappers usually ends up with records being to be pulled back from the database to the business logic tier or client, changes applied and record-based update statements generated by the OR mapper and sent back to the db server. This is fine when working with a small number of records or when the data is needed in a higher tier for the update to take place (e.g. a user editing a customer record).

However, there are situations where this is a less-than-ideal way of working with the data. When a large number of records need to be updated, e.g. all products in a category discontinued, seat “2A-2C” blocked on all future flights etc the data is usually not needed in the application layers. Instead, set based operations done directly in the database are the most efficient way to do these operations. Many ORMs, including Linq-to-SQL and Entity Framework do not support set-based operations so set-based operations instead result in the above mentioned record-based operations with ensuing unnecessary network traffic, extra work for the database server etc.

When using Linq-to-SQL out of the box there are two ways to achieve set-based operations where an update, delete or “insert into … select … from …” statement will be executed on the database without actually returning any data:

  • Stored procedures; the set based update/delete/insert is added to a stored procedure, the stored procedure is in turn added to the model and called as a method on the data context.
  • SQL-in-strings; the good old method of concatenating a SQL statement in a string and running it using ExecuteCommand.

Both methods work fine, but the drawback is that the language integration and LINQ flexibility is lost when using either of the methods.

Wouldn’t it be neat with an extension allowing set-based updates, deletes etc based on the L2S-generated entity classes?

I’m thinking something along the lines of this. I’m going to use Northwind as the example here instead of the res system db just to make the examples more simple to follow for everyone familiar with Northwind. Say we want to discontinue all products in a product category:

using (NorthwindDBML.DataClasses1DataContext dc = new DataClasses1DataContext())
{
    dc.Update<Product>(
        (where => where.Category.CategoryID == 1),
        (set => set.Discontinued == true)
        );
}

The first lambda expression passed to the Update method is the where clause for the update statement, and the second lambda defines the updates that will take place. The final result should be a SQL update statement along the lines of:

update dbo.Products  set [Discontinued] = 1 where ProductID in (SELECT [t0].[ProductID]
FROM [dbo].[Products] AS [t0]
LEFT OUTER JOIN [dbo].[Categories] AS [t1] ON [t1].[CategoryID] = [t0].[CategoryID]
WHERE [t1].[CategoryID] = @p0)

Another example, say we want to add the IDD prefix “+49″ in front of the phone number for all German customers if they don’t already have the it. That would be expressed as:

dc.Update<Customer>(
    (where => where.Country == "Germany" && !where.Phone.StartsWith("+49")),
    (set => set.Phone == "+49" + set.Phone),
    LockModeEnum.Row);

…and would result in a SQL update statement like this:

update dbo.Customers with (rowlock) set [Phone] = (@p0 + [Phone]) where CustomerID in (SELECT [t0].[CustomerID]
FROM [dbo].[Customers] AS [t0]
WHERE ([t0].[Country] = @p1) AND (NOT ([t0].[Phone] LIKE @p2)))

To make it play nice, it takes a parameter for locking hints too. Isn’t that neat?

As already established, this is not supported in Linq-to-SQL out of the box. (and not supported in entity framework at all). So do we have to wait for Microsoft to maybe add that in a future version? No, won’t work. They already said that entity framework is their “preferred” data access technology despite L2S being more popular among their user base than EF so I don’t think they will spend time and money on implementing something like this.

Instead, I played around a bit and came up with the following. This is just a first version so there are probably bugs in it. In addition, if you have tables using reserved keywords such as “from”, “where” etc as the table-name or in column names then it will break. It may also break when used with future versions of L2S (if there will be any). That said, here’s the first implementation of set-based language integrated update statements using Linq-to-SQL data contexts and Linq-to-SQL generated classes.  (delete, “insert into … select … from …” will come in a future version).

Enjoy:

//Linq-to-SQL set-based operations. Copyright (c) 2008, Huagati Systems Co.,Ltd. ( www.huagati.com )
//DISCLAIMER: Use at your own risk, this is a code sample provided as-is with no warranties. By using the code below you take responsibility for any damage that may occur as a result.
//You are not allowed re-publish this code sample (including but not limited to articles, blogs, source code libraries etc) without prior written consent.
//You may however include it in in your own software provided that this header remains intact in the source code,
//  and that the main purpose or usage area of that software is not to add functionality to Linq-to-SQL.
//You may not modify it for usage with ADO.NET Entity Framework without prior written permission. If you do, your license to use it is automatically revoked.
using System;
using System.Linq;
using System.Data.Linq;
using System.Data.Linq.Mapping;
using System.Linq.Expressions;
using System.Collections.Generic;
using System.Reflection;
using System.Reflection.Emit;
using System.Text.RegularExpressions;
namespace System.Data.Linq.SetBased
{
    public enum LockModeEnum
    {
        NotSpecified,
        Row,
        Page,
        Table,
        TableExclusive,
        NoWait
    }
    public static class LinqSetbasedExtensions
    {
        public static void Update<T>(this DataContext dc, Expression<Func<T, bool>> selectionCriteria, Expression<Func<T, bool>> updateOperation)
        {
            Update<T>(dc, selectionCriteria, updateOperation, LockModeEnum.NotSpecified);
        }
        public static void Update<T>(this DataContext dc, Expression<Func<T, bool>> selectionCriteria, Expression<Func<T, bool>> updateOperation, LockModeEnum lockMode)
        {
            IQueryable<T> baseQuery = ((IQueryable<T>)dc.GetTable(typeof(T))).Where(selectionCriteria);
            Update<T>(dc, baseQuery, updateOperation, lockMode);
        }
        public static void Update<T>(this DataContext dc, IQueryable<T> baseQuery, Expression<Func<T, bool>> updateOperation, LockModeEnum lockMode)
        {
            //get the type representing the table we're dealing with
            Type baseType = typeof(T);
            //get primary key members for the type we're dealing with
            List<MetaDataMember> primaryKeyMembers = GetPKMembers(dc, baseType);
            if (primaryKeyMembers.Count == 0)
            {
                throw new InvalidOperationException(baseType.Name + " has no primary key members.");
            }
            //create a new class containing only the PK members of the table we're going to update
            MemberBinding[] memberBindings = null;
            ParameterExpression[] baseTypeParams = new ParameterExpression[] { Expression.Parameter(typeof(T), "") };
            Type pkType = CreatePKClass(typeof(T), primaryKeyMembers, baseTypeParams[0], out memberBindings);
            //change the query to only get the PK members
            IQueryable pkQuery = GetPKQuery(baseQuery, pkType, baseTypeParams, memberBindings);
            //extract the select query for the primary key members
            System.Data.Common.DbCommand pkSelectCommand = dc.GetCommand(pkQuery);
            string whereClause = null;
            if (primaryKeyMembers.Count == 1)
            {
                whereClause = "where " + primaryKeyMembers[0].Name + " in (" + pkSelectCommand.CommandText + ")";
            }
            else
            {
                string ct = pkSelectCommand.CommandText;
                whereClause = ct.Substring(ct.IndexOf("\r\nfrom ", StringComparison.InvariantCultureIgnoreCase));
            }
            //generate a select query with the update expression as the where clause
            IQueryable<T> updateOPQuery = ((IQueryable<T>)dc.GetTable(typeof(T)).AsQueryable());
            updateOPQuery = updateOPQuery.Where(updateOperation);
            //get the table name
            string qualifiedTableName = dc.Mapping.GetTable(typeof(T)).TableName;
            //extract the where clause from the update operation query
            System.Data.Common.DbParameterCollection setClauseParameters;
            string updateClause = GetSetClauseFromWhere(dc.GetCommand(updateOPQuery), qualifiedTableName, out setClauseParameters);
            //concatenate the update statement
            string updateStatement = ReplaceSQLParameters(
                "update " + qualifiedTableName + " " + GetLockHint(lockMode)
                + " set " + updateClause
                + " " + whereClause);
            //merge parameter groups into a single array
            object[] parameters = MergeParameters(setClauseParameters, pkSelectCommand.Parameters);
            //excute the update
            dc.ExecuteCommand(updateStatement, parameters);
        }
        private static List<MetaDataMember> GetPKMembers(DataContext dc, Type baseType)
        {
            //get primary key members for a L2S class/type
            return
                (
                    from pkMember in dc.Mapping.GetTable(baseType).RowType.DataMembers
                    where pkMember.IsPrimaryKey == true
                    select pkMember
                ).ToList<MetaDataMember>();
        }
        private static IQueryable GetPKQuery(IQueryable baseQuery, Type pkType, ParameterExpression[] baseTypeParams, MemberBinding[] memberBindings)
        {
            //generate a lambda for creating a new instance of the PK type, mapped to the base type for the table
            LambdaExpression lambda
                = Expression.Lambda(
                    Expression.MemberInit(
                        Expression.New(pkType),
                        memberBindings),
                    baseTypeParams
                  );
            //create a new query returning only the PK members as instances of the new pk type
            return
                baseQuery.Provider.CreateQuery(
                    Expression.Call(typeof(Queryable),
                            "Select",
                            new Type[] { baseQuery.ElementType, lambda.Body.Type },
                            baseQuery.Expression,
                            Expression.Quote(lambda)
                    )
                );
        }
        private static Type CreatePKClass(Type baseType, List<MetaDataMember> primaryKeyMembers, ParameterExpression baseTypeParam, out MemberBinding[] memberBindings)
        {
            //generate a new assembly and class
            AssemblyBuilder assembly
                = AppDomain.CurrentDomain.DefineDynamicAssembly(
                    new AssemblyName("PKClasses"), AssemblyBuilderAccess.Run);
            ModuleBuilder moduleBuilder = assembly.DefineDynamicModule("PKModule");
            TypeBuilder typeBuilder = moduleBuilder.DefineType("PKClass_" + baseType.Name, TypeAttributes.Class | TypeAttributes.Public);
            //add the primary key members to the new type
            foreach (MetaDataMember pkMember in primaryKeyMembers)
            {
                string memberName = pkMember.Name;
                Type memberType = pkMember.Type;
                //create a private field for storage
                FieldBuilder storageField = typeBuilder.DefineField("_" + memberName, memberType, FieldAttributes.Private);
Â
                //add a get method, returning the storage field
                MethodBuilder propertyGet
                    = typeBuilder.DefineMethod("get_" + memberName,
                        MethodAttributes.Public | MethodAttributes.SpecialName | MethodAttributes.HideBySig,
                        memberType, Type.EmptyTypes);
                ILGenerator getIL = propertyGet.GetILGenerator();
                getIL.Emit(OpCodes.Ldarg_0);
                getIL.Emit(OpCodes.Ldfld, storageField);
                getIL.Emit(OpCodes.Ret);
Â
                //add a set method, setting the storage field
                MethodBuilder propertySet
                    = typeBuilder.DefineMethod("set_" + memberName,
                        MethodAttributes.Public | MethodAttributes.SpecialName | MethodAttributes.HideBySig,
                        null, new Type[] { memberType });
                ILGenerator setIL = propertySet.GetILGenerator();
                setIL.Emit(OpCodes.Ldarg_0);
                setIL.Emit(OpCodes.Ldarg_1);
                setIL.Emit(OpCodes.Stfld, storageField);
                setIL.Emit(OpCodes.Ret);
                //create a public property for the member
                PropertyBuilder memberProperty = typeBuilder.DefineProperty(memberName, System.Reflection.PropertyAttributes.HasDefault, memberType, null);
                //assign the get/set methods to the property
                memberProperty.SetGetMethod(propertyGet);
                memberProperty.SetSetMethod(propertySet);
            }
            //return the type of the generated class
            Type pkClass = typeBuilder.CreateType();
            //get member bindings mapping the members of the pk class to the corresponding members in the base class
            //(out param)
            memberBindings =
                (
                from pkMember in primaryKeyMembers
                select (MemberBinding)Expression.Bind(pkClass.GetProperty(pkMember.Name),
                            Expression.Property(baseTypeParam, baseType.GetProperty(pkMember.Name)))
                ).ToArray<MemberBinding>();
            //return the new class' type
            return pkClass;
        }
        private static string GetLockHint(LockModeEnum lockMode)
        {
            //return a lock hint based on the specified lock mode
            switch (lockMode)
            {
                case LockModeEnum.NotSpecified:
                    return "";
                case LockModeEnum.Row:
                    return "with (rowlock)";
                case LockModeEnum.Table:
                    return "with (tablock)";
                case LockModeEnum.TableExclusive:
                    return "with (tablockx)";
                case LockModeEnum.Page:
                    return "with (paglock)";
                case LockModeEnum.NoWait:
                    return "with (nowait)";
                default:
                    return "";
            }
        }
        private static string GetSetClauseFromWhere(System.Data.Common.DbCommand updateOperationCommand, string qualifiedTableName, out System.Data.Common.DbParameterCollection commandParameters)
        {
            //get just the criteria part of the query
            string commandText = updateOperationCommand.CommandText;
            commandText = commandText.Substring(commandText.IndexOf(qualifiedTableName) + qualifiedTableName.Length);
            commandText = commandText.Substring(commandText.IndexOf("where", StringComparison.InvariantCultureIgnoreCase) + 5);
            //new stringbuilder for generating the set clause
            System.Text.StringBuilder ctBuilder = new System.Text.StringBuilder();
            //replace 'and' and 'or' with comma separators
            string[] separators = { " and ", " or ", " AND ", " OR "};
            foreach (string statementPart in commandText.Split(separators, StringSplitOptions.None))
            {
                //remove unnecessary parenthesis wrappers
                string sp = statementPart.Trim();
                if (sp.StartsWith("(") && sp.EndsWith(")"))
                {
                    sp = sp.Substring(1, sp.Length - 2);
                }
                //remove "[t0]." table alias
                sp = sp.Replace("[t0].", "");
                //add to the string builder
                ctBuilder.Append(sp);
                ctBuilder.AppendLine(", ");
            }
            ctBuilder.Remove(ctBuilder.Length - 4, 4);
            //return the command parameters in the out param variable
            commandParameters = updateOperationCommand.Parameters;
            //return the where-clause turned into a set clause
            return ctBuilder.ToString();
        }
        private static string ReplaceSQLParameters(string sqlCommand)
        {
            //replace @0, @1, etc with {0}, {1} etc...
            Regex paramMatching = new Regex(@"@p\d", RegexOptions.Compiled);
            int paramNo = 0;
            while (paramMatching.IsMatch(sqlCommand))
            {
                sqlCommand = paramMatching.Replace(sqlCommand, "{" + paramNo.ToString() + "}", 1);
                paramNo++;
            }
            return sqlCommand;
        }
        private static object[] MergeParameters(params System.Data.Common.DbParameterCollection[] parameterGroups)
        {
            //merge parameter groups into a one-dimensional object array. (nested loops as the dbparametercollection don't implement IEnumerable)
            List<object> parameters = new List<object>();
            foreach (System.Data.Common.DbParameterCollection parameterGroup in parameterGroups)
            {
                foreach (System.Data.Common.DbParameter parameter in parameterGroup)
                {
                    parameters.Add(parameter.Value);
                }
            }
            return parameters.ToArray();
        }
    }
}

13 August 2008

Application Architecture - Part 5a - Data Access Layer - A closer look at ADO.NET Entity Framework

Filed under: Architecture — Tags: , , , , — Kristofer @ 16:57

In two previous articles in the architecture section - Part 1 and Part 4, I mentioned that I intended to take a closer look at Microsoft’s Entity Framework once it has been RTM’d (released) in order to decide if ADO.NET EF or Linq-to-SQL is a better base for my DAL for HuagatiRes. Well, now that EF has been released as part of [the hastily RTM'd] Visual Studio 2008 SP1 and .net 3.5 SP1 I guess it is time to start playing around with it.

After several re-tries downloading the 831Mb VS2008SP1 image (all failed thanks to my ISP CSLoxinfo’s ‘invisible’ caching proxies), and getting past the installation quirks I finally got the service pack installed late last night. I quickly proceeded importing the HuagatiRes DB into an entity data model (EDMX file) and noticed some nice improvements in the EDMX designer over the Linq2SQL designer.

Improvements over the Linq to SQL designer

Some of the improvements that I first noticed in the designer:

  • The designer clearly identifies what relationships/associations/FKs are one-to-zero-or-one, one-to-one, one-to-many etc.
  • The navigation properties related to associations are clearly visible in the designer.
  • There is a model browser that plugs right into the same panel as the solution explorer that allows both the store and classes to be explored and looked up in the model.
  • It can even update the model from the database without having to resort to writing an add-in:)
  • Foreign key constraints behind associations are identified by name in the model browser and properties dialog.
  • There is a documentation field in the properties dialog where entities, members, associations etc can be documented.

So yes, the EDMX designer at a first glance has some improvements compared to the Linq2SQL designer.

Designer shortcomings

Improvements aside -  there are some missing features that at least I think would have been nice to have in the EDMX designer. Maybe they’re coming in a future version (SP1a, SP1b, SP2, …?)

  • Although the model explorer, and association properties, shows both FK constraint details as well as primary keys on tables, it does not show any indexes other than the PK. Why not both show the indexes [and index members] in the model explorer, and import index info and at least have attributes storing information about available indexes? Surely a more complete overview [including indexes] would be useful for all database developers. It would also be nice to see index info in the intellisense bubble when typing where clauses and joins in LINQ expressions, but of course that would require that the model (edmx and generated classes) include that information first. Maybe the EF team need to include a handful of enterprise developers in their advisory council
  • Progress dialogs for lengthy operations.

Let me elaborate a bit on this point:
I decided to take EF for a test-drive on a enterprise-system-size data model. The database I decided to use for this test-drive is not overly large, but large enough to fall in the enterprise-system category with its’ 1000+ tables and 2000+ associations.

I started generating the ADO.NET Entity Data Model (EDMX) from the database at 4PM. At first VS kept SQL Server very busy. The only feedback I got from VS was the hour glass mouse pointer, along with “(Not Responding)” in the title bar. There is no [visible] progress bar, no status messages or anything else to indicate how far it got.

Visual Studio hanging while generating an EDMX file for a 1000+ table database

Maybe I misinterpreted something, but when reading some article about EF somewhere on the intarweb I got the impression that EF was aimed at enterprise developers so I thought this would be an interesting test. But then again, maybe it is for enterprise developers with small databases / small data models.

 

…on to the last “missing-but-nice-to-have” items in the EDMX designer:

  • Naming convention support…  Oh well, I’ll add edmx support to a future version of the naming convention thingie in the DBML Tools add-in.
  • A way to import the table/column/fk human-readable descriptions/notes/documentation from data model diagrams (e.g. from Erwin .erx, ERStudio .dm1, or Visio for Enterprise Architects .vsd  files) into the description fields in the EDMX file. Well, there’s an idea for an add-in for Visual Studio. Maybe someone who doesn’t like re-typing the descriptions of every table, field and foreign key from the data model (like me) will write one…
  • Multi page / tabbed diagrams. A single diagram designer surface for the entire db is not really useful for schemas larger than Northwind. It looks like someone thought about it because the edmx file would support it although the designer don’t:
    <edmx:Diagrams>
          <Diagram Name=”HuagatiRes”>
                …     

 

Conclusion

The edmx designer has some nice improvements compared to the Linq-to-SQL designer. However, it lacks some basic features that would make it useful when working with schemas larger than Northwind.

I hope that Microsoft will take a look at data modeling tools to get some feature ideas for the next version / next service pack. Although the edmx and dbml designers are not intended to be used for data modelling, bringing some of the basic features that all data modelling tools have into the designers would really make them more useful during development.   …or why not even take it a step further and add the features needed to use these designers to model both the object model and the database schema?  That would eliminate the need for separate data modelling tools and could allow both the database schema and object model to be created from inside Visual Studio.

 

Coming up….: The next article in this series (5b) will re-visit the performance comparison between EF and Linq to SQL that Roger Jennings performed on the beta. I am hoping that the beta version was slowed down by debug code and that the RTM version will be faster / have less overhead.

 

Update: An hour later (two hours after starting the EDMX generation) I noticed that VS is no longer beating up SQL Server. Visual Studio is still hanging, but it appears to be stuck doing some internal work and using a full CPU to do whatever it is that is keeping it busy.

Visual Studio is still hanging after two hours with no UI feedback on what it is doing.

Update 2 (6 hours later): Visual Studio is still hanging, and I have to admit that trying to import the schema of a large database into a single ADO.NET Entity Data Model (EDMX) was a stupid experiment. It has to be divided up into smaller chunks for a) EF to be able to handle it, b) for maintainability, and c) for source control due to the single-EDMX-single-cs-file-per-data-model-approach.

Unfortunately chunking it up into many DALs leads to duplicated entity definitions for common/shared entities but at this point it appears that EF is not yet up to the challenge of supporting larger schemas.

Hopefully a future version of EF will support using and merging multiple designer surfaces with support for shared entities. Maybe by then it will even be able to use the information already available in the data model diagrams (erx/dm1/vsd) to divide up the schema in functional areas / subject areas.

For now guess it would be a good idea for Microsoft to add a “top 200″ or “top 400″ [or whatever is the limit for the EF designer] to the SQL Server catalog queries that the EDMX designer thing uses to import tables. Just to avoid having people try this kind of stuff…

6 hours later... 

Update 3 (16 hours later): The next morning VS was still hanging with no indication of how far it got and how much longer it would take. I decided it was time to terminate the experiment…

16 hours after the experiment started - terminating...

A closer look at the files in my project folder revealed a 8Mb+ edmx file along with a 1.75kb .cs file. I opened the edmx file using a more reliable/stable development tool; MS Visual Notepad and noticed that all entities, members, associations etc had been defined in the edmx. However, the diagram section was empty so I am guessing that Visual Studio spent all night on trying to decide how to layout the 1000+ files in a single designer surface.

The generated edmx file is without layout information

After restarting Visual Studio I tried adding the generated edmx file to an existing project. It got added but trying to open the designer resulted [again] in a hung Visual Studio (using 100% of one CPU core).

Update 4: I tried to generate the entity model using the command line tool for EF - edmgen.exe - hoping that it is better suited for large schemas than the edmx designer. I started it at 9:35am using the FullGeneration option. Unlike the designer/wizard in Visual Studio, EdmGen does output status messages.

 

C:\temp>edmgen /mode:fullgeneration /project:LargeDB /provider:System.Data.SqlClient
     /connectionstring:"server=(local);integrated security=true;database=largeenterprisedb"
Microsoft (R) EdmGen version 3.5.0.0
Copyright (C) 2008 Microsoft Corporation. All rights reserved.
Loading database information...
Writing ssdl file...
Creating conceptual layer from storage layer...
Writing msl file...
Writing csdl file...
Writing object layer file...
Writing views file...

Update 5: EdmGen crashed after 20-25 minutes.

edmgen crash

The console output simply stated “Process is terminated due to StackOverflowException.”

I rest my case. And I promise I will stick to small schema databases in my continued explorations of the ADO.NET Entity Framework.

 

Update 6: A Microsoft rep told me in the support forum that the EF designer is good for models with up to 120 tables. “Our response for the designer is that performance is typically reasonable up to about 120 tables, after which thinkgs begin slowing down.

http://forums.microsoft.com/Forums/ShowPost.aspx?PostID=3757588&SiteID=1

I should have asked before trying to use EF with a larger model… :)

14 July 2008

Application Architecture - Part 4 - Data Access Layer - LINQ and Linq2SQL Links

Filed under: Architecture — Kristofer @ 12:57

Linq2SQL, groupings and aggregates

In my first entry on Linq2SQL almost a month ago I listed a few of the “childhood diseases” in Linq2SQL. One of those is that grouped queries with multiple aggregates are less than optimal; the Linq2SQL provider translates such queries into one subquery for each aggregate resulting in poor I/O statistics. A few months back when I originally stumbled upon that problem while working on another LINQ2SQL based project, Hedgehog, I did some whining on the same subject in the msdn forum for the Linq project and I also submitted a suggestion on Microsoft Connect hoping for a better/more efficient Linq-to-SQL query translation in .net 3.5 SP1. The suggestion on Microsoft Connect remains untouched and I never heard anything more in the msdn forum so the workaround for now has been to keep that kind of queries in stored procedures.

That was until today when I came across the latest entry in Matt Warren’s blog. Matt has helped me out with Linq2SQL questions several times before, both in the Linq2SQL msdn forum and through his blog. In his latest blog entry, Matt covers the topic of groupings and aggregates, and he solves the problem by implementing a better IQueryable provider that supports multiple aggregates without resorting to sub queries. Very elegant, Matt!

Read more on Matt’s IQueryable provider in his blog:

Now I am really hoping that Matt’s solution will be included in .net 3.5 SP1 RTM or SP 1½ if it is too late to include any new stuff in SP1. Although I think fixes, improvements, and optimizations to already released functionality (Linq2SQL) should have higher priority in the service pack than brand new functionality such as EF that shouldn’t really be part of a service pack. But that’s Microsoft’s call and is probably more controlled by politics and the ongoing controversy surrounding Linq2SQL vs EF.

Comparison of Linq2SQL and Entity Framework

Roger Jennings of OakLeaf Systems just published a comparison of Linq2SQL and Entity Framework on his blog. He compares code size, memory footprint and more importantly initialization and roundtrip times against an empty database to see if there are any bottlenecks in the ‘plumbing’. For his tests he used my first draft datamodel as a base for generating entity classes, and he also covered some of the other content in this blog.

Roger is a consultant and author with 30+ books behind him. His books cover Microsoft technology from operating systems to databases, development etc and has been translated to 20+ languages. I’m of course flattered and humbled that Roger posted a reference to my blog, and I am happy to see that he is basing the Linq2SQL vs Entity Framework comparison on my data model. Thanks Roger!

The most interesting finding in Roger’s Linq2SQL vs EF comparison is that EF not only has a larger footprint (as expected) but it shows significantly slower roundtrips against an empty database. This is something that needs to be investigated in more detail to reveal the underlying reasons.

Read Roger’s article and his findings over at:

I haven’t yet looked into the details or potential reasons behind the huge difference between Linq2SQL and EF roundtrip times that Roger unveiled, as I have previously decided to stick with Linq2SQL until EF is at least RTMed. However, I will certainly try to repeat Roger’s comparison on EF once RTMed, in addition to a closer look at what EF throws at the database with regards to query composition and how that translates to usage of I/O capacity, CPU, locks and other database server resources.

Personally I’m quite happy with the one-to-one mapping between database tables and Linq2SQL entities, I know [by now] reasonably well how Linq2SQL interacts with SQL Server, and whenever I need further abstraction from the data model I accomplish that by writing new classes on top of the ones generated by Linq2SQL. If EF does not perform and scale at least as well as Linq2SQL then I will probably stick to [the more lightweight] Linq2SQL in combination with Matt Warren’s improved IQueryable provider.

25 June 2008

Application Architecture - Part 3 - Interfaces/APIs for interaction with external systems

A reservation system has to exchange information with a heap of external systems; accounting, revenue management, flight planning/scheduling, ops, GDSes and other distribution networks, third party booking systems (i.e. travel agencies and corporate customers interfacing directly with the airline) and so on.

For some of these there are somewhat standardized interfaces, others not. Some of the standardized interfaces are aged, inconvenient, error prone and uses communication techniques and formats that originated back in the age of dinosaurs, mainframes, EBCDIC etc. Although it is almost certainly necessary to support many of these ‘dinosaur’ formats I don’t think it is a good idea to build in limitations in the system based on limitations and boundaries that existed in computer systems in the 1960s, or even have native support for them in the core of the system. Rather, designing the system with today’s and tomorrow’s hardware and software capabilities in mind, exposing functionality using modern APIs, and then adding support for the ‘dinosaurs’ as add-on gateways built on the modern APIs leaves more room for flexibility and system features that are not compatible with the ‘dinosaurs’.

Examples of this is: passenger names in a modern system do not need to be in 7-bit uppercase ASCII unless a specific booking originates in a system only supporting this. For internal bookings, unicode is fine and it is probably appreciated by passengers whose names are not expressable in 7-bit uppercase ASCII. Likewise, fare rules don’t need to be expressed in 2 character ATPCO footnote codes. And so on.  </rant>

The diagram below shows the three main categories of the built in APIs and how other systems will interact with them.

 

 

The APIs are divided into three main categories;

  1. Native web service APIs.This is a set of web service / WCF services reflecting all the APIs available at the business logic level. The method signatures, parameters etc will reflect how this system is designed and will support the functionality used internally by the reservation system itself. Beyond being web-services, method signatures etc are system specific rather than standards based.
  2. OTA-based web services.Â

    Although OTA only supports a subset of the functionality in the system, the common booking related messages such as availability, pricing, booking, booking modification, cancellation, schedules etc are supported by this API category.

    This is a set of web services based on the OpenTravel Alliance message specifications.OTA is an effort by several companies in the travel industry to come up with an XML based standardized messaging format as a replacement for old/aged EDIFACT or other text-based ‘dinosaur-age’ message standards such as AIRIMP. This allows third parties to interface with the system using a standardized messaging format that is a bit easier to enforce/validate than the text based formats and that is less ambigous than airimp.

  3. Event Notification Service.Â

    Notifications will be subscription based, and the system will make an out-going web service call using a pre-defined event signature for each event type subscribed to, to the subscribers owning and/or authorized to receive events related to the specific booking/flight/passenger/customer/etc.

    This is a set of plug-in points for external systems to get notified about things happening inside the system.External systems may need to know when a booking is created/modified/cancelled, when inventory levels change, when fares change, when flight schedules change etc. This can be used for synchronization with other systems, for sending out notifications to people/passengers/agents/etc affected by such changes, and for additional validation against external systems such as no-fly/selectee validation where the methods and regulations differ significantly between different countries.

Built on top of these three, gateways supporting the old message formats can be built as needed whereas systems that are capable of interfacing with the native web service APIs or OTA based interfaces may do so.

A more detailed look into the specific classes and methods included in each of the API categories will follow…

23 June 2008

Application Architecture - Part 2 - Data Access Layer - Dynamic Linq

Filed under: Architecture — Kristofer @ 12:14

The previous entry on app architecture covered my choice of Linq2SQL as a base for the data access layer. As I mentioned in that post Linq2SQL auto generates much of the data access code, entity classes for all tables etc and Linq itself adds type safety, intellisense support, compiler syntax checking and other goodies to queries.

However, there are cases when parts or all of a query is not known at the time the query is written. With traditional SQL it is easy to dynamically build queries at runtime, but how about Linq?

Dynamic Linq

Fortunately, Microsoft thought of that. There are several ways to dynamically build Linq queries, but perhaps the easiest way is to use the System.Linq.Dynamic namespace that - although not a part of the .net framework - ships as a code sample that can be downloaded from Microsoft. This namespace is included in the file dynamic.cs embedded in the linq code sample collection.

Breaking the trend of their usually good documentation for development tools and technologies, the only documentation for System.Linq.Dynamic are two small and simple code samples; one included in the download zipfile itself and the other one in a blog post by Scott Guthrie. There is also an apparent lack of code comments inside the source file for System.Linq.Dynamic itself. There are only 12 comment lines embedded in the 2000+ lines of code in dynamic.cs. I’m sure they’ll come up with some better documentation for it some day, and maybe they’ll even include it as part of the .net framework in a future version.

There is also one bug that makes it throw an exception “Operator ‘=’ incompatible with operand types ‘Guid’ and ‘Guid’” when the where clause contain a criteria on a guid/uniqueidentifier member. This however is easy to fix since we have the source code by changing line 712 in dynamic.cs (class System.Linq.Dynamic.ExpressionParser, function ParseComparison) from:

if (isEquality && !left.Type.IsValueType && !right.Type.IsValueType) {
...to:
if (isEquality && ((!left.Type.IsValueType && !right.Type.IsValueType)
    || ((left.Type == typeof(Guid) || left.Type == typeof(Guid?))

      && (right.Type == typeof(Guid) || right.Type == typeof(Guid?))))) {

Microsoft’s new open source strategy is really nice in this way - it really makes it easier to find bugs, work around them or - as in this case - fix them.

Poor documentation of the dynamic library and a few bugs aside, the dynamic linq query library is a very useful addition to the stuff built into .net 3.5 as it allows Linq queries to be built on the fly at runtime when needed. This comes in handy for things like handling reference tables, custom built / user defined reports etc. These benefits outweighs the inconvenience of not having any detailed documentation as it allows consistent use of Linq2SQL throughout the entire application.

Reference Tables

Handling reference tables and the data contained in them is the first candidate for dynamically generated queries. You may recall the table reference_table from one of the data model articles. The purpose of this table is to store a list of all tables holding reference data so these tables and their data can be handled/retrieved/created/maintained in a relatively generic manner.


Using the list of reference tables stored in this table, we can write a function that uses the stored class name to get a type reference, then uses that type reference to get a reference to the Linq2SQL table and finally use dynamic linq to retrieve reference data for that table, all starting off by just using a pre-defined constant with the table code identifying the table.

using System;
using System.Linq;
using System.Linq.Dynamic;
using System.Collections.Generic;
using System.Collections;
using System.Linq.Expressions;
using System.Web.Caching;
public static class ReferenceData
{
    private static List<ReferenceTable> _referenceData = new List<ReferenceTable>();
    public static IList GetReferenceData(string tableCode)
    {
        return GetReferenceData(tableCode, null, null);
    }
    public static IList GetReferenceData(string tableCode, string filter, params object[] pval)
    {
        //load list of reftables if not already loaded
        LoadReftables();
        //get table info
        ReferenceTable rt = _referenceData.Single(rtb => rtb.Code == tableCode);
        Type dataClassType = Type.GetType(rt.ClassName);
        IList requestedData = null;
        //check if the data is in the cache
        bool cacheEnabled = (ConfigSettings.CacheExpirySeconds > 0);
        string cacheKey = GetRefdataCacheKey(tableCode, filter, pval);
        if (cacheEnabled == true)
        {
            Cache pageCache = System.Web.HttpContext.Current.Cache;
            requestedData = (IList)pageCache[cacheKey];
        }
        //no cache hit
        if (requestedData == null)
        {
            //get data context
            using (HuagatiResDataContext dc = HuagatiResDataContext.GetDC())
            {
                //query the table
                if (filter != null)
                {
                    requestedData
                        = dc.GetTable(dataClassType).
                           Where(filter, pval).
                           OrderBy("SortSeq, Name").
                           Select("it").ToList();
                }
                else
                {
                    requestedData
                        = dc.GetTable(dataClassType).
                           OrderBy("SortSeq, Name").
                           Select("it").ToList();
                }
            }
            //store in the cache
            if (cacheEnabled == true)
            {
                Cache pageCache = System.Web.HttpContext.Current.Cache;
                DateTime expiryDateTime = DateTime.Now.AddSeconds(ConfigSettings.CacheExpirySeconds);
                pageCache.Insert(cacheKey, requestedData, null, expiryDateTime, TimeSpan.Zero, CacheItemPriority.Normal, null);
            }
        }
        return requestedData;
    }
    private static void LoadReftables()
    {
        lock (_referenceData)
        {
            if (_referenceData.Count == 0)
            {
                //get data context
                using (HuagatiResDataContext dc = HuagatiResDataContext.GetDC())
                {
                    //load reftable info
                    IEnumerable<ReferenceTable> rts = from rd in dc.ReferenceTables orderby rd.SortSeq select rd;
                    _referenceData.AddRange(rts.ToList());
                }
            }
        }
    }
}

In addition to just retrieving the data, the code sample above also has some logic for using the asp.net page cache to store reference data, and for retrieving and storing the list of reference tables in a static list. As reference data is typically fairly static with infrequent changes it is usually safe (and a good idea) to cache it to avoid a database roundtrip every time a dropdown list is populated.

Retrieving reference data, for example a list of available passenger types is now as simple as:

IList passengerTypes = ReferenceData.GetReferenceData(RefTables.PassengerType);

Meta Data

Having a generic way to retrieve reference data we can also extend this class with functionality for creating and describing the reference tables; this allows generic maintenance screens that can be generated at runtime for all reference tables and takes away a lot of code-writing every time a new reference table is added or changed. The System.Data.Linq.Mapping namespace contains the functionality needed to retrieve meta data for Linq2SQL generated entity classes, the function below retrieves all members of a reference table using the mapping namespace:

    public static System.Collections.ObjectModel.ReadOnlyCollection<MetaDataMember> GetRefdataMembers(string tableCode)
    {
        //load list of reftables if not already loaded
        LoadReftables();
        //get table info
        ReferenceTable rt = _referenceData.Single(rtb => rtb.Code == tableCode);
        Type dataClassType = Type.GetType(rt.ClassName);
        using (HuagatiResDataContext dc = HuagatiResDataContext.GetDC())
        {
            MetaType metaType = dc.Mapping.GetTable(dataClassType).RowType;
            return metaType.DataMembers;
        }
    }

Insert/Delete/Update

Adding generic methods for inserting, updating and deleting records in reference tables completes the foundation for retrieving and maintaining reference data.

    public static void InsertMember(object data)
    {
        using (HuagatiResDataContext dc = HuagatiResDataContext.GetDC())
        {
            dc.GetTable(data.GetType()).InsertOnSubmit(data);
            dc.SubmitChanges();
        }
        ExpireFromCache(data.GetType().Name);
    }
    public static void DeleteMember(object data)
    {
        using (HuagatiResDataContext dc = HuagatiResDataContext.GetDC())
        {
            dc.GetTable(data.GetType()).Attach(data);
            dc.GetTable(data.GetType()).DeleteOnSubmit(data);
            dc.SubmitChanges();
        }
        ExpireFromCache(data.GetType().Name);
    }
    public static void UpdateMember(object data)
    {
        using (HuagatiResDataContext dc = HuagatiResDataContext.GetDC(false))
        {
            string whereClause = null;
            object[] pkValues = dc.GetPKMembers(data, out whereClause);
            object existingEntity = dc.GetTable(data.GetType()).Where(whereClause, pkValues).Select("it").ToList()[0];
            dc.CopyDataMembers(data, existingEntity);
            dc.SubmitChanges();
        }
        ExpireFromCache(data.GetType().Name);
    }
    private static void ExpireFromCache(string typeName)
    {
        //load list of reftables if not already loaded
        LoadReftables();
        //get the table code
        string tableCode = null;
        lock (_referenceData)
        {
            try
            {
                ReferenceTable refTable = _referenceData.Single(rtb => rtb.ClassName == typeName);
                tableCode = refTable.Code;
            }
            catch {}
        }
        //find any cache entries for this table and kick them out
        if (tableCode != null)
        {
            Cache pageCache = System.Web.HttpContext.Current.Cache;
            foreach (DictionaryEntry cacheItem in pageCache)
            {
                if (((string)cacheItem.Key).StartsWith(tableCode))
                {
                    pageCache.Remove((string)cacheItem.Key);
                }
            }
        }
    }
 

Synchronization

In order to synchronize lists of entities, or members values between two instances of an entity class I have written a couple of extension methods for handling synchronization. It is not really related to dynamic linq, but the UpdateMember method in the example above makes use of one of these synchronization functions; CopyDataMembers so I am including my linq synchronization extension functions here too.

using System;
using System.Data;
using System.Linq;
using System.Linq.Dynamic;
using System.Data.Linq;
using System.Data.Linq.Mapping;
using System.Collections;
using System.Collections.Generic;
public static class LinqSync
{
    public static void SynchronizeMembers(this HuagatiResDataContext dc, Type entityType, IList existingSet, IList updatedSet)
    {
        SynchronizeMembers(dc, entityType, existingSet, updatedSet, null);
    }
    public static void SynchronizeMembers(this HuagatiResDataContext dc, Type entityType, IList existingSet, IList updatedSet, string syncChildMembers)
    {
        foreach (var existingEntity in existingSet)
        {
            try
            {
                //find corresponding record in edited list
                string whereClause = null;
                object[] pkMembers = GetPKMembers(dc, existingEntity, out whereClause);
                IQueryable matchingEntity = updatedSet.AsQueryable().Where(whereClause, pkMembers).Select("it");
                var updatedEntity = matchingEntity.ToList()[0];
                //remove from edited list
                updatedSet.Remove(updatedEntity);
                //sync with existing item
                CopyDataMembers(dc, updatedEntity, existingEntity);
                if (syncChildMembers != null && syncChildMembers.Length > 0)
                {
                    //sync child tables
                    SyncChildTables(dc, updatedEntity, existingEntity, syncChildMembers);
                }
            }
            catch (ArgumentOutOfRangeException)
            {
                //item was removed in the updated collection
                dc.GetTable(entityType).DeleteOnSubmit(existingEntity);
            }
        }
        //remaining entries in edited list are new. add for insert
        dc.GetTable(entityType).InsertAllOnSubmit(updatedSet);
    }
    private static void SyncChildTables(HuagatiResDataContext dc, object updatedEntity, object existingEntity, string syncChildMembers)
    {
        string[] childMembers = syncChildMembers.Split(",".ToCharArray());
        foreach (MetaDataMember mem in GetAssociatedMembers(dc, existingEntity))
        {
            foreach (string childMemberName in childMembers)
            {
                if (childMemberName == mem.Name && mem.StorageAccessor.HasValue(updatedEntity))
                {
                    if (mem.Association.IsMany)
                    {
                        SynchronizeMembers(dc, mem.Type, (IList)HelperFunctions.GetPropertyValue(existingEntity, mem.Name), (IList)HelperFunctions.GetPropertyValue(updatedEntity, mem.Name));
                    }
                    else
                    {
                        CopyDataMembers(dc, HelperFunctions.GetPropertyValue(updatedEntity, mem.Name), HelperFunctions.GetPropertyValue(existingEntity, mem.Name));
                    }
                }
            }
        }
    }
    public static object[] GetPKMembers(this HuagatiResDataContext dc, object forEntity, out string whereClause)
    {
        //query for primary key members
        IEnumerable<MetaDataMember> pkMembers = from mem in dc.Mapping.GetTable(forEntity.GetType()).RowType.DataMembers
                                                where mem.IsPrimaryKey == true
                                                select mem;
        List<MetaDataMember> pkMems = pkMembers.ToList();
        //prepare return value
        object[] returnValue = new object[pkMems.Count];
        //loop through pk members
        int pkMemNo = 0;
        string whereCls = "";
        string wAnd = "";
        foreach (MetaDataMember mem in pkMems)
        {
            returnValue[pkMemNo] = mem.StorageAccessor.GetBoxedValue(forEntity);
            whereCls += wAnd + mem.Name + " == @" + pkMemNo.ToString() + " ";
            wAnd = " and ";
            pkMemNo++;
        }
        whereClause = whereCls;
        return returnValue;
    }
    private static IEnumerable<MetaDataMember> GetAssociatedMembers(HuagatiResDataContext dc, object entity)
    {
        //get list of all entity members that are associations
        return from mem in dc.Mapping.GetTable(entity.GetType()).RowType.DataMembers
               where mem.IsAssociation == true
               select mem;
    }
    public static void CopyDataMembers(this HuagatiResDataContext dc, object sourceEntity, object targetEntity)
    {
        //get entity members
        IEnumerable<MetaDataMember> dataMembers = from mem in dc.Mapping.GetTable(sourceEntity.GetType()).RowType.DataMembers
                                                  where mem.IsAssociation == false
                                                  select mem;
        //go through the list of members and compare values
        foreach (MetaDataMember mem in dataMembers)
        {
            object originalValue = mem.StorageAccessor.GetBoxedValue(targetEntity);
            object newValue = mem.StorageAccessor.GetBoxedValue(sourceEntity);
            //check if the value has changed
            if (newValue == null && originalValue != null || newValue != null && !newValue.Equals(originalValue))
            {
                //use reflection to update the target
                System.Reflection.PropertyInfo propInfo = targetEntity.GetType().GetProperty(mem.Name);
                propInfo.SetValue(targetEntity, propInfo.GetValue(sourceEntity, null), null);
                //setboxedvalue bypasses change tracking - otherwise mem.StorageAccessor.SetBoxedValue(ref targetEntity, newValue); could be used instead of reflection
            }
        }
    }
}

18 June 2008

Application Architecture - Part 1 - Data Access Layer - LINQ and Linq2SQL

Filed under: Architecture — Tags: , , — Kristofer @ 13:38

Although the data model is just in its earliest stages I will cover some application architecture topics in parallell. First off is the data access layer. As I am basing this on Microsoft technology using Microsoft tools there are some interesting new tools that Microsoft just made available in November 2007 when they released the .net framework 3.5, Visual Studio 2008 and C# 3.0.

LINQ, Language INtegrated Query

The first interesting part of .net 3.5 is LINQ, short for Language INtegrated Query. This is a combination of new .net framework classes and new language and compiler features. LINQ allows programmers to write queries directly in C# code (or other supported .net languages). I’ll stick to C# examples here. LINQ queries can target just about anything, objects, object collections, databases, xml etc. The great benefit of using LINQ is that it adds type safety, intellisense etc to the queries.

Linq2SQL

One of the extensions to LINQ that ships with .net 3.5 is LINQ2SQL. Linq2SQL includes an ORM (Object Relational Mapping) designer for SQL Server databases, and an entity and data context code generator. Basically it allows you to drag-and-drop tables, stored procedures etc from a SQL Server database onto the ORM designer inside Visual Studio, and it will automatically generate an object model diagram along with entity classes for all tables containing all fields and foreign keys as properties of matching data types.

The generated entities and members can be renamed using naming conventions more suitable in C# directly in the designer while maintaining a reference to the underlying names in the database, and additional properties, methods etc can be added to the entity classes as they are implemented as partial classes.

The Linq2SQL ORM DBML designer surface with entity classes corresponding to a few of the res system tables

The screenshot above shows the Linq2SQL ORM designer / DBML designer with a few of the entity classes corresponding to tables in the data model.

With the simple basic part of the data model created so far, the Linq2SQL ORM designer and code generator just saved me a lot of typing; the generated entity classes are 800kb+ and I now have an object model corresponding to my data model.

Once the data access layer object model is in place, LINQ queries can be written directly against the generated entity classes. Behind the scenes, Linq2SQL generates parameterized SQL queries, handles database connections, caching if needed, execute the queries and generates back objects matching the query results.

And the best part of it: intellisense support for all entities and entity members; one no longer have to memorize every detail of the data model or stare at a data model diagram while writing code. And as the query is written in C# against strongly typed objects, the visual studio syntax checker will catch syntax errors, type mismatches and other mistakes that does not show up until at runtime when using traditional SQL queries.

The query in the screenshot above,

    

  

 

(
from oca in dc.CityAirports
join rte in dc.Routes on oca.AirportCode equals rte.OriginAirportCode
join dca in dc.CityAirports on rte.DestinationAirportCode equals dca.AirportCode
where
oca.CityCode == originCityCode
  && dca.CityCode == destinationCityCode
 
&& oca.Airport.ActiveFlag == true
 
&& dca.Airport.ActiveFlag == true
 
&& oca.City.ActiveFlag == true
 
&& dca.City.ActiveFlag == true
select rte
).ToList<DAL.
Route
>();

…is automatically turned into parameterized SQL when the query is executed. Running SQL Profiler in parallell when testing the method above shows the following automagically generated SQL:

SELECT [t1].[route_id] AS [ID], [t1].[origin_airport_code] AS [OriginAirportCode], [t1].[destination_airport_code] AS [DestinationAirportCode]
FROM [dbo].[city_airport] AS [t0]
INNER JOIN [dbo].[route] AS [t1] ON [t0].[airport_code] = [t1].[origin_airport_code]
INNER JOIN [dbo].[city_airport] AS [t2] ON [t1].[destination_airport_code] = [t2].[airport_code]
INNER JOIN [dbo].[airport] AS [t3] ON [t3].[airport_code] = [t0].[airport_code]
INNER JOIN [dbo].[airport] AS [t4] ON [t4].[airport_code] = [t2].[airport_code]
INNER JOIN [dbo].[city] AS [t5] ON [t5].[city_code] = [t0].[city_code]
INNER JOIN [dbo].[city] AS [t6] ON [t6].[city_code] = [t2].[city_code]
WHERE ([t0].[city_code] = @p0) AND ([t2].[city_code] = @p1) AND ([t3].[active_flag] = @p2) AND ([t4].[active_flag] = @p3) AND ([t5].[active_flag] = @p4) AND ([t6].[active_flag] = @p5)

LINQ2SQL runs all queries using the sp_executesql stored procedure so they execution plans are all cached in SQL Server’s stored procedure execution plan cache for later reuse; something that saves cpu cycles for complex queries.

Cons?

The drawbacks of using LINQ2SQL? This is covered in hundreds of other blogs discussing the pros and cons of LINQ2SQL, commonly accusing it of ‘taking control away from DBAs’. In my experience (and I have used it in a medium sized project with some ~70 tables as the DAL; Hedgehog is 100% Linq2SQL based and has been running in production environments for months) there are just a few minor things that can all be worked around:

  1. First of all, it does not allow SQL optimizer hints such as join method hints, index hints, locking hints etc to be included in generated queries. However, in the rare cases when optimizer hints are required, those queries can be converted to stored procedures instead and Linq2SQL also supports stored procedures.
  2. Another ‘issue’ is locking/isolation mode. It only allows isolation mode to be set for queries that are run within a transaction context, so every little ‘dirty read’ / ‘read uncommited’ / ‘with (nolock)’ query needs to be run inside a transaction. No big deal really because the code for it can be hidden in extensions to the generated data context class but it would still be nice to be able to avoid the overhead of a database transaction for such queries.
  3. Queries that extensively use groupings and aggregates result in less than optimal generated SQL, so that kind of queries are also better to move out to stored procedures for now.
  4. Serialization support is poor; one has to go through hoops and tricks and avoid certain features in order to make the generated entity classes serializable using binary or XML serialization in .net.

Hopefully Microsoft will address those issue or at least some of them in .net 3.5 SP1 which is scheduled to be released sometime later this year.

The future

dotNet 3.5 SP1 will also contain ADO.NET entity framework, the next-generation version of Linq2SQL that allows for further abstraction between the object model and the database. EF also has better support for serialization which makes life easier for us n-tier developers. However, until SP1 RTM is out (a beta is already available for download from Microsoft) I will stick to Linq2SQL and possibly convert to EF once it has been officially released.

Powered by WordPress