Slow-Moving Targets

Tuesday, March 01, 2005

Defensive Generalization (Part 3: The Essence of MDA)

A Kubrik Moment

Many things drive us to generalize when we build software systems. We experience that strong desire to solve a problem once and keep it solved. We want to make sure that the experience we won building yesterday's system shows itself in today and tomorrow's system. We look forward to new failures (cough) to overcome and new successes to build on top of.

In Part 2, we started a deliberate process of pulling apart the clumps of knowledge we had about how CRM was put together. Forgive me for the review, but I want to spend a bit more time on something I glossed over in the last post.

In setting up a homebrew model-centric code generation system the first efforts often yield something that is coupled to the structure of the application in question. Just so I actually have something to write about, let's say that our first round of factoring resulted in the following configuration.
  • CRM domain model (as an XML file of some sort)
  • CRM domain-to-relational model transformation
  • CRM domain-to-DAO model transformation
  • CRM domain-to-web-presentation model transformation
  • CRM relational data model (another XML file)
  • CRM Java DAO model (XML)
  • CRM web presentation model (more XML)
  • Code generators for the three CRM technology-specific models
  • The finished CRM application
Sitting back with a mug of coffee in my hand, two thoughts strike me. One, this coffee needs a bit more sugar, and two, I was really getting tired of typing CRM over and over. Being one that is mildly allergic to unnecessary repetition, I find it far more satisfying to parameterize that list above with "CRM" rather than repeat it.

In the story I told last time, we eventually generalized the infrastructure so that the transformations were expressed at a metamodel level. The crafting of those metamodels is one of the things that I want to talk about in this post.

Metamodels, or modeling languages are yet another dimension of knowledge factoring. In the same way we build up jargon around something when we know it well enough, we build metamodels to express a stock understanding of the way things in a certain category of experience relate to each other. When I began writing about Data Access Objects and Java Server Pages using their acronyms, you understood what I was writing about because you knew the expansion of those acronyms, and how those pieces fit together to make a business application.

Just as we build acronyms by using shorthand representations, we can push knowledge into metamodels to create Domain-Specific Languages (DSLs, not to be mistaken for Digital Subscriber Lines). When we use DSLs in system construction, it allows us to speak in shorthand about specific aspects of our system. Assumptions and assertions about the rules of how those aspects relate to each other are built into the DSL. A given DSL is particularly powerful when it allows us to make correct assumptions about the implementation of the concepts expressed with it.

We need to create four modeling languages to continue the generalization of our model-driven framework. Our toolbox now includes metamodels for relational databases, Java business tiers (including DAO), Java web presentation tiers, and a generic component-based application language (our domain model). These metamodels allow us to decouple the transformations and code generators from the information that must change from project to project: the target application (CRM in this case).

In our CRM domain model we can say that Customers have Calls. In the domain model for our order entry system we specify that Orders have OrderLines. In either project, we can use the same transformation to produce a relational model where the two entities in question are transformed into two tables. This transformation definition is now orthogonal to CRM and Order Entry, but specific to the two metamodels the transformation must bridge. In the same way, we write code generation templates to depend on metamodel elements, and hence we generate DDL for Customer and Call, or Order and OrderLine, with equal ease.

We've arrived at a new division of artifacts. On one side of this division we have what we can think of as class-level artifacts. Our metamodel of relational data says that columns cannot exist unless they are part of a table. Our templates for generating DAOs call for the creation of a value object. These metamodels and code generation templates are class-level artifacts. Just like Java classes have instances at run-time, our class-level artifacts have instances when we use them to build systems. You can think of our models of CRM and Order Entry as instance-level artifacts. The web presentation model of Order Entry, then, is an instance (or occurrence) of the web presentation metamodel. The CRM JSP code is an instance of the web presentation model-to-code templates. The "regions" separated by these divides are often referred to as metalevels.

The artifacts on the instance side of this divide can be modified without requiring modification to artifacts on the class side. The dependency only goes one way, from instance to class, from model to metamodel. If your modeling language changes, naturally you must change your model to fit. We expect this, just as we might expect to alter some code when going from one C compiler to another. When changes occur on the class side of this divide, sweeping changes occur in the instance side; and this is precisely the effect we were looking for. When we choose to change our style of persistence from DAOs to Hibernate, we expect the resulting source code to be very different. The same notions of Customer or Order now have a Hibernate application architecture applied to them. We hope to see an improvement in response time in both applications by making a change in one spot. This ability to reflect new architectural know-how in one location was the reason we pursued separation of concerns in the first place.

Factoring: The Final Frontier

We now need to look beyond our fictional department to that new contract down the street. This new customer will have their own architecture, frameworks, and libraries. In this new environment, we find a need to generalize and factor knowledge in the tools we use to build such model-driven development facilities. We need tools and facilities for defining metamodels, model-to-model transformations, and code generation templates. Armed with these tools, we can set up our environment for model-driven development again.

Looking a little further, we see an industry stuck repeating itself. If we continue the process of driving repetition from our work, the character of this work becomes the elimination of reinvention from implementation to implementation, and vendor to vendor. In other words, for our model-driven approach, we factor the knowledge we've gained about how to factor knowledge into industry standards.

At its heart, the Model-Driven Architecture is a pattern of tools and standards that allow you to build systems using metamodels, models, model transformations, and code generation templates. The "A" in MDA isn't about the application's architecture. The "A" stands for the facilities and techniques we, as developers, use to build applications. It is an Architecture for building systems, using models.

The essence of MDA lies in its goals of separation of concerns (adhering to the DRY principle), and generalization of its facilities, even to the extent of industry standardization.

The OMG's MDA uses the Meta Object Facility to define metamodels or modeling languages. The forthcoming MOF-QVT (Query View Transformation) specification defines a set of languages (more on this in a future post) for writing model transformation specifications between two or more MOF metamodels. The MOF Model to Text specification (still in its infancy) will define a standard template language for generation of text from MOF-metamodel-based models.

You'll notice I didn't mention UML. The reason is that UML is not central to an MDA approach. Having spoken this heresy, let me soften it a bit by saying that most MDA implementations, including Compuware's OptimalJ, use pieces of UML (most especially the class metamodel and diagrammatic notation). This portion of UML is expressed as a MOF metamodel. In the case of our domain metamodel, UML is extended with features we need for a model-driven... umm... architecture. UML is a general purpose modeling language, not a special-purpose DSL.

To complete our survey of all things DRY, let's finish mapping our efforts to MDA jargon.

The code we don't write anew for each application is factored into frameworks and libraries we selected, like Hibernate, Struts, and the Xerces XML parser. This code also lives in application servers, databases and operating systems that we select. In MDA parlance, the combination of these things is the platform our system will run on. Models that contain information specific to the class of platforms, which includes our particular platform, is said to be a Platform-Specific Model (PSM). We create PSMs using a PSM language or metamodel defined in MOF. In our example, the CRM relational model is an example of a PSM.

In our example, we also have a domain model where Customer is defined. In the domain model we expressed the essence of Customer, absent any technological or implementation-specific decisions. This model is orthogonal to the class of platforms we choose to implement the system on; it is a Platform Independent Model (PIM). Like PSMs, we express PIMs in a PIM language, which, in turn, is defined in MOF.

That, in a nutshell, is what MDA is about. Granted, there's a lot missing from what I've written in these three posts, but you've experienced the main avenue of thought on the matter. In building any MDA implementation, you must go through the process of parceling knowledge into the platform, metamodels, transformations, templates, models, and code. Making all of these choices may not be easy, but it will prove worthwhile.

Going forward I hope to start examining the different points of view within the MDA community. I have my own (colored, naturally, by my experience with OptimalJ), but I intend to call out where my opinion and someone else's opinion diverge. In these three posts, I've voiced my opinion, yet shown a lot of the "fact" that is MDA as we know it today.

Tuesday, September 07, 2004

Kicking It Up A Notch (Part 2: The Essence of MDA)


We've applied transformation rules to a conceptual model of the entities in our system (see my previous post). These rules transform an abstract model into a more concrete model. Another set of rules translates this more concrete model into code in the form of Data Description Language (DDL). Wonderful! But what about the rest of the system?

Many software methodologies or processes apply the same technique to a model of the software that will read and write customer records. They start by drawing the boxes and lines to represent the software in a model. Sometimes they leap from there, to code, and sometimes they refine the model into a more detailed, more specific model. This specific model will naturally be filled with allowances for the target implementation platform. At this point, I should apologize. I've switched directions from my previous post and started to write of top-down construction. We were in the process of zooming out, so let's go back to our working system and look at the software side of things.

We're staring at hundreds of Java classes in this small Customer Relations Management system (CRM). The way our system was built, there's a small component for each table in the system. We notice that each component has the same three or four classes and implements some common interfaces. For each component, one of the classes makes JDBC calls, one translates between a JDBC record set and a Java bean, and another class provides a set of operations to the outside world for the table it embodies.

If this sounds familiar to you, you're absolutely right. This is a form of the Data Access Object (DAO) pattern for Java persistence. (Please read this post as orthogonal to the religion of persistence frameworks. Personally I'm partial to Hibernate, but DAOs provide an easy example.) As we look beyond the customer component we can spy a few other patterns. The Data Transfer Object (DTO) pattern ("orthogonal..." think "orthogonal") works in combination with the DAO pattern. We see a sort of wrapper class that implements caching while simplifying the programming interfaces to the DAOs. This is a combination of the Facade and Decorator patterns.

Whoever built this system really knew what they were doing. It may not be the best or the coolest way to build a Java application, but this code is very consistent. It is easier to maintain a system when you can readily understand the code, and consistent code is much easier to understand. If the code is well-factored into applied design patterns, so much the better. These design patterns allow us to get a wider perspective on how the pieces of our system work together.

OK, so an experienced developer or an architect made some decisions that resulted in a consistent application of technology to the business domain. In this software, our domain concept of the customer lives across many classes.

  • Customer DAO
  • Customer DTO
  • Customer decorating facade
  • Customer UI classes and JSPs
  • and more...

  • I'm going to refer to this set of technology design decisions as the architecture of the application. This includes those choices such as the use of relational databases over flat files or OODBs. But repeated application of the architectural design to the domain concepts resulted in the well-factored code we see in this CRM application.

    This leads us to another example of repetition. Patterns in the code we're examining can be identified and factored out in a manner that allows us to specify the combination of a domain concept (Customer) with a set of collaborating patterns. The work of producing the code, however, requires that we repeatedly interpret the intent of the patterns. We should automate this somehow.

    Driven to Abstraction

    We resolved this sort of repetition when building the database by representing the concepts we cared about as models. From a fairly generic (yet well-formed) model of our domain concepts, we automated the creation of a more specific model. From that specific model, we generated working code in the form of DDL. We need to go through this abstraction process again for the software. We'll find, however, that the mappings for software are slightly more complex.

    A set of design patterns and idioms collaborate with each other to implement the system. These patterns and idioms are parameterized with details specific to each persistent domain class. Let me rephrase that: These patterns and idioms are parameterized with a model. We can factor out the details of a how a pattern collaboration is parameterized from how its members actually collaborate to achieve the solution to a generic problem. But where do we put the details of the pattern collaboration? In the code generation templates, of course.

    We have a model that has enough of the details of our target implementation to control how the implemented code comes out. We see model elements like DataClass (corresponding to the DTO), DAOComponent, DAOFacade, and WebComponent. These model elements have rules that say how they fit together, and how the combination of properties on them may be set. But it is the code generation templates we have lashed up to these metamodel elements that assigns meaning (or semantics) to them.

    Now we have the knowledge of how to implement (pattern collaboration), and what to implement (abstract model) neatly factored out, each one from the other. We now have a very easy way of applying an application design consistently.

    We're still aiming to drive out repetition, however, so we have to find a way to factor out the actual domain notion of Customer from these special purpose models. While we need to have a CustomerDataClass and a CustomerDAO, we want some way to express that any time we encounter a persistent domain class in the system, it ought to be handled this way by default. We solved this problem with relational database design and implementation by having two kinds of models, a logical and a physical and having rules that mapped out how one could be created from the other.

    Naturally we want to do the same thing here, so we use a very generic class model where we represent the core characteristics of Customer. We have a set of rules that map out what model elements to create in our more concrete models. But why this extra step? Why not simply throw that conceptual (domain) model straight into the code generation templates?

    The reason is that not every part of the pattern collaboration may apply for every occurrence of a particular model element. For example, perhaps we don't need a decorating facade for domain classes that are owned in a composite relationship. We could add this kind of knowledge to the pattern collaboration, but that would make them far more complex. It is much easier to indicate these choices with the presence or absence of model elements. We could add this information to the domain model, but now we have knowledge in the domain model that doesn't really belong there. After all, the fact that CustomerCall doesn't get a decorating facade isn't really one of the core characteristics of CustomerCall, is it? It is a characteristic of the implementation. That decision belongs in a mapping, so we'll put it there.

    We now have four different kinds of models (at least). We have a logical database model and a physical database model, a domain class model, and an implementation-oriented software model. Rule sets map some implementation decisions for us and code generation templates produce the actual implementation (whether it be DDL or Java code). Our work isn't done, though. We have repetition of the Customer concept between the domain class model and the logical database model.

    Solving this repetition is fairly easy. The trick is to have one of our highly abstract models serve as the source for mappings to many kinds of less abstract models. In this case, our domain class model probably has enough information to fill this role. The mappings between the domain model and the physical DBMS model become a little more complex, but object-relational mapping is a well-worn art that can be automated suitably.

    We've done a fairly good job of factoring knowledge in our development framework. We have a domain model where we capture the information specific to the requirements of the domain. We have mappings that identify large-grained architectural choices, such as mappings to a relational database model, and mappings to a specific kind of Java software model. We have models that contain enough information to effectively parameterize a series of detailed design decisions captured as code generation templates.

    But what about that Order Entry project coming next month? We'll have to go through this process of abstraction all over again. This means we have one more significant level of factoring so that we may reuse the approach. We'll discuss this in Part 3.

    Sunday, June 06, 2004

    Don't Repeat Yourself (Part 1: The Essence of MDA)

    Keeping Your Code DRY

    It's all about change. Software maintenance can become impossible if the software was written poorly. Well-designed, well-crafted software remains soft, it can be changed and extended to accommodate the changing needs of its users.

    Perhaps the most important characteristic of the software is how and where knowledge is represented. If you distribute knowledge throughout the software, then changing to a new "understanding" becomes difficult. For example, one classic maintenance problem in client/server application development is the "magic pushbutton."

    All the code for a feature of the program lived in the event for that button push. When some of that functionality (knowledge) needed to be reused, the relevant bits of code were copied into the event for the menu item, or into the code for a dialog box. Any change to that feature--to the data structures or rules involved in their manipulation--would ripple through each place where that knowledge was represented. Miss just one of those representations and your software would behave unpredictably, or worse, the software would cause the user to lose work.

    While it is not impossible to maintain software built this way, it becomes significantly easier to get the changes wrong or to miss something. In their wonderful book, The Pragmatic Programmer: From Journeyman to Master, Dave Thomas and Andrew Hunt spelled out the DRY principle: Don't Repeat Yourself. Have a single, authoritative representation of knowledge in your system.

    Working on features of Compuware's OptimalJ, and working at the OMG on Model Driven Architecture, had me thinking about the DRY principle in the context of MDA. Adrian Colyer's recent explanation of Aspect-Oriented Programming only made the ideas knocking around in my head clearer.

    Adrian's blog discussed the relationship between different concepts a developer deals with when building a system. Specifically, he wrote about how Aspect-Orient Programming addresses the problem of representing associated concepts when they relate to each other in a one-to-n fashion. By providing the capability of representing knowledge of the associations explicitly (through point cuts) AOP allows representation of associated concerns once and only once. Usually this results in more concise and higher quality code.

    Model Driven Architecture provides a solution to this problem. But it does so while providing something even more important: higher abstraction. We'll get to that as we work through this code factoring problem.

    Zooming In
    I've actually written about this before, but that was sort of shooting from the hip, so I'll take a more considered attempt at it here.

    Starting at the end of a long trail of debugging sessions, code cranking fugues, design decisions, architectural expressions and requirements we see a working system, fully conceived. Examining it closely, we see that our notion of Customer, culled from the mental floss of our requirements documents, lives in many places in our working system. Customer, or rather the collaborating notions of Customer, must live in all these different places to allow users of the system to accomplish work.

    Customer exists as RM_CUST_MAST in our relational database. It breathes as instances of the com.compuware.myapplication.Customer class, and attendant helper objects and interfaces. It is rendered in many formats across many UI screens (or pages, in this example). It's heartbeat plays out across the glue code and configuration files that hook these pieces together. The same thing is true for Call, and Product, and all of the myriad business notions that found their way into the system.

    Since we have written all of this code by hand (an extreme case, but we'll go with it for the moment) we've violated the DRY principle on a number of levels.

  • Customer, as a notion, is repeated and distributed about our system's code, as are any of the other domain concepts we need our system to manipulate.
  • We repeatedly expressed the knowledge of transforming a domain concept into a relational database entity.
  • We repeatedly expressed the knowledge of how to render domain concepts as Java classes.
  • We repeatedly expressed the knowledge of how to handle associations between domain objects in code.
  • We repeatedly expressed the knowledge of how to expose domain objects to the user through the UI.
  • We repeatedly expressed the knowledge of how to map Java objects to database tables.
  • ...

  • How do we fix this? We start by finding alternatives to representing knowledge that puts information in the right spot.

    Zooming Out With Models

    Let's attack the DBMS representation first. If you've built large or complex databases you've probably already solved this particular problem. You represented the concept of an entity and its associations and rendered the code that the DBMS operates on. Using a diagramming tool, you created a model of Customer, Call, and Product. This model could then be rendered as DDL specific to our database of choice. The model showed us just what information was important to create implementations in a specific database. The model allows us to target Oracle 9i, SQL Server, MySQL. More importantly, it let us start at Oracle 7.3 and upgrade to subsequent versions when we needed to.

    The diagramming tool likely used two representations of Customer: one in the logical model and another in the physical model. The logical model captured the general idea of the Customer entity and the physical model captured the actual definitions of the database structure according to a specific database. Rules or mappings define how elements in the logical model should be interpreted in the context of a specific database.

    Specific knowledge and design decisions are now parceled up into discreet packages. We defined the concept of a Customer data table in one place. Someone defined the mapping rules between this concept and how it will be implemented in another place. The model of the implementation, and the rules for turning this model into code (DDL) exist in two more distinct places. If the tool is especially flexible, or if I built the tool myself, I can control the DDL generation and refine it as my expertise with the software platform (Oracle) increases. Even better, if the tool permits it, I can control the mapping rules between models as my expertise with the class of platforms (RDBMS) increases.

    In creating these stepping stones to implementation (a metaphor I'm borrowing from MDA Distilled), we've achieved two things. First, through abstraction, we've achieved a more powerful form of expression that allows us to say more. Second, we've apportioned knowledge in our system in a more fine-grained manner. This manner of parceling up different kinds of knowledge in different spots is exactly the kind of technique we need to effectively practice the DRY principle.

    We haven't yet solved the larger problem, though. More on this for my next entry, stay tuned.