Archive for the ‘Patterns’ category

Pattern for Floating Labels with Streams

September 1st, 2009 by dave

I’ve encountered a number of use cases in the field where people enjoy the use of  ‘floating’ labels.  Implemented correctly, floating labels are a good example of convention-over-configuration  usually eliminating repetitive, manual build system updates.   Implemented incorrectly, they can be a traceability curse.   In this writeup I’ll show two examples of how to implement a floating label pattern in AccuRev.

Deprecated Concept? If you’re used to using floating labels as ‘markers’ on a branch [in another CM system] for stable code or feature complete state, you’ll find that the AccuRev stream hierarchy and associated workflow is a much better solution – in some cases, eliminating the need for floating labels altogether.  In my experience, floating labels have been used because the traditional CM system didn’t support process – so you had to roll your own – ultimately leading to lots of scripting and a solution that doesn’t scale.

Stories of Misuse. Every now and then I hear stories about how an urgent quick fix or missing doc change -needed- to be included in the “release-4.0.1″ that just got labeled in source control a few days prior. In some traditional CM systems, you have the ability to commit new file changes, un-label the old version and re-label the new version. As easy as this sounds, this anti-pattern instantly breaks traceability [do your labels maintain history?] and can quickly lead to confusion as “release 4.0.1″ has multiple meanings depending on when the label was pulled, either before or after the re-label. Example. I once asked someone why they routinely re-label individual files and their answer was – “because creating a new full label takes about 30 minutes.”  Why so long? Because most CM systems label the individual files (or references) in a given configuration; the more files you have, the longer it will take to label the entire configuration.  I’ve even heard stories about creating labels (and branches!) that take upwards of an hour each run.  Unless you are using a time/txn-based CM system, labeling build events is required if you need to track and compare changes between builds over time.  If you are waiting 30-min per label, forget about the agile “10-minute” build – you haven’t even gotten to the compile step yet!

Valid Cases. In practice there are  cases where having a floating label makes sense, typically related to your build system.  Most cases have to do with marking transient build configurations.  For example, when doing nightly or continual integration builds, let the build script or CI server simply pull/poll a single label and let the developers or build engineer determine what content is bound to the label. This way, the build server can be “dumb” and just get whatever the developer or build engineer say it should have.  This eliminates the need to configure your build system with each-and-every new build label – especially painful for those build systems requiring manual edits of config xml files – ugh.

The Pattern.   Two examples are presented below showing how floating labels can be used to support continuous integration with streams.

Static Floating Label.  In this example, continuous builds are performed on the single stream App_Integration.  No need to change your CI configuration. New builds are kicked off whenever changes are  promoted [by team leads, etc] from the child project streams.

Static Floating Label
Static Floating Label

The CI/build tool is configured to monitor the single App_Integration stream.  As the configuration changes, the name of the build stream remains unchanged – hence, the App_Integration stream is the ‘floating label’.

When doing very-frequent builds (e.g. per-promote or even nightly) this pattern works well as long as you have the discipline to immediately fix any broken configurations.

Dynamic Floating Label.

In this example, individual build configurations for App_Integration are marked with snapshots.  New snapshots are created whenever changes are promoted [by team leads, etc] from child project  streams, or on schedule (e.g. Nightly).

Dynamic Floating Label
Dynamic Floating Label

The stream Int_Good_Build is dynamically reparented to either the latest snapshot (e.g. latest build) or reparented to an alternate (e.g. prior) snapshot representing the last known good build.  The ability to move this stream, on-demand and as needed, represents a dynamic ‘floating label’ pattern.  Simply reparent the stream, update the attached workspace and the files on disk will mirror the current release snapshot configuration.  The same pattern can be applied to production releases where internal snapshots capture detailed per-configuration changes, but the reparented floating label can represent the latest public release.

Summary. In all, I’m a big fan of creating snapshots for each build, and removing old[er] ones along the way.  There are two camps on the subject, and this will be left for another blog.  Thus, I like the visibility of knowning which snapshots have been created (implying a purpose), and having a ‘marker’ stream as a floating label (e.g. context) to identify the current build being tested and/or delivered to an Int or Test machine.   Lastly, having recent snapshots immediately available assist with debugging why the latest build is broken – diff by snapshot.  sweet.

/Happy Coding/ dave

Pattern for Versioning Generated Objects

May 5th, 2009 by dave

After building your software, do you check-in your generated binary  files? How about the output from test runs? If your software runs on multiple platforms or your test runs take hours/days to execute, you may want to consider storing the output — especially if binary reproducibility is critical.

Example. Consider shipping an application to a customer who 2 years later reports a defect. Can you reproduce their build “today”? Surely you have the exact versions of source files. But are you using the exact build file? Probably. How about the original version of the compiler? Maybe. But probably not. Don’t forget that your compilers get upgraded too — their optimization algorithms or bugfixes can change the binary execution format of your application. Thus, compiling source from 2 years ago may result in an equally functioning application at the user-level, but at the byte-level, things may have changed dramatically — and at a level where runtime defects (performance/memory) rear their ugly heads.

Myth #1: Committing generated  files results in longer checkout times. No developer wants to checkout source code and wait for or be inundated with megabytes of .o, .class, .jar, .war files that they are either never used or are going to be rebuilt anyway.  The AccuRev Truth: Use include/exclude rules on streams and workspaces to control which streams have access to generated objects and who will receive them during checkout.

Myth #2: Committing binary files slows down your CM system. Traditional SCM systems combine both meta data and content resulting in slower performance over time as the number of files increase (think labeling).  The AccuRev Truth: AccuRev stores meta-data separate from file contents and uses indexes to lookup and retrieve contents.  For example, transactions are labeled not files.  Using a card catalog (index lookup) to find your books is always faster than walking the isles (linear scan).

Myth #3: Storing generated artifacts will bloat the repository. Back in the day of wild-west coding, there was little rhyme or reason for where files were saved in the source tree.  The build system would simply compile the files it found, save the generated output right next to the source file, and as long as everything linked & compiled — it worked.  But in todays complex world of multi-layer software architectures, tiered deployments, mixed technologies, and sophisticated build tools, following a convention is almost a necessity (think  ruby on rails, maven, etc). The AccuRev Truth: Organizing the top-level source tree and configuring your build tool can make it very easy to carve out source vs. binary vs. tests vs. scripts, etc.  Using include/exclude rules, end-users can decide at the stream or workspace level what parts of the file tree need to be visible.

The Pattern. In this pattern for versioning generated artifacts, I’ll show how streams can be used to store generated files only in the appropriate stage of development and prevent unwanted exposure to developers.  Two options are present that can also be used in combination.

Option #1: sub-configurations

Option #1: sub-configurations

Option #1: Store and track generated artifacts as sub-configurations isolated from the mainline.   From a baseline snapshot such as a test build or release candidate, create a new child stream to store the generated artifacts.  Then create a second snapshot that represents both source code and generated artifacts. For a single “configuration” you now have two snapshots – one for source only and a second for source + binary.  Furthermore, you can diff these two snapshots to know exactly how the binary configuration is different from the source configuration.  You might also consider storing compiler files, debugging output, test output,the compilers themselves (!), etc.

Option #2: include/exclude rules

Option #2: include/exclude rules

Option #2: Store and track generated artifacts directly in mainline but exclude them from downstream access using stream-level exclude rules.   The top-most streams that need access to both source and binary will include the majority or entire filesystem footprint in their configurations.   The first stream that does not need access to generated objects will likely be the candidate to set an exclude rule on the folder(s) that contain those files.  The exclude rule is inherited to all children and grandchildren.

When using exclude rules, it is easiest to set a single rule on a top-level ‘./build’ or ‘./generated’ folder rather than creating a rule for each sub-folder in a large source tree.  Traditionally, make based build systems would generate the compiled files in-line with the source code.  Lately, ant based build systems would package all generated artifacts in a separate sub-tree off the root.  Regardless of your build tool, it’s best to have all generated artifacts in their own tree – it makes it easier to exclude as well as safer to clean!

In practice I see both patterns in use and both have equal merit depending simply on the situation at hand.  Option #1 is commonly used when generated artifacts are not to be included in the official release.  For example, transient or secondary artifacts such as test cases, debugging output, reports, etc.  These files are not promoted up to the release stream.  Option #2 is usually used when the generated artifacts are expected to be included in the official release snapshot.  Thus, they are promoted up through the test/build/release streams.  The build system for these types of ‘uber’ configurations may have multiple release targets creating different levels of release packages such as ‘minimal’, ‘app’ , ‘app-with-tests’ and ‘full’.  That is to say, the CM system may have all possible files but you can choose what actually gets deployed.  Ultimately, storing everything in the CM system may likely be the right choice for audit and reproducibility.

/Happy Coding/

Pattern for Private Prototype Development

November 13th, 2008 by dave

by Dave Thomas

Your team has been coding a feature for a few days, across dozens of files, and everyone is excited with progress.  Deep into development, you find an interesting 3rd party library that may simplify your work and possibly take the team’s feature to the next level.  But it will require adding new files, moving some directories, and refactoring some critical code.  However, you’re not ready to commit your current changes because they are unfinished and will surely break everyone else.  Your changes need to be saved before prototyping in case of a rollback but also be integrated with prototype development.  And what if the new library is a bust?  If you start co-mingling the library integration with unfinished code, the potential revert process will be a complete nightmare and waste of time.

Private Versioning. With AccuRev’s private workspace, you can always commit your changes early, often, and safely without sharing amongst your peers.  But in this case you have two logical development efforts, the 2nd effort depends on the first and both need commits to preserve evolving changes.   How do you keep them cleanly separated and continue to work on both in parallel?

Stream Inheritance.  AccuRev streams have an intriguing and very powerful feature called inheritance.   Similar to how an OO subclass implicitly inherits methods from parent and grandparent classes, a child stream implicitly inherits versions of files & directories from parent and grandparent streams.   Taking advantage of inheritance, we can use streams to independently manage logical changes and cleanly maintain change dependencies without physically co-mingling files.

The Pattern.  Prototyping changes without co-mingling files can be done by simply creating a series of ‘personal’ or ‘private’ development streams (though, they are just regular dynamic streams). This pattern will create a “Feature”, “MyDevelopment”, and “MyPrototype” stream sub-hierarchy.  See Picture.

click to enlarge

click to enlarge

From your Integration stream (or equivalent), start by creating a “Feature” stream that will collect all changes from your entire team.  [See related blog, Stream-per-Task Pattern].  The majority of team members may have their workspaces directly from here.  Next, create a child stream called “MyDevelopment” that will track your personal ongoing development activity.  Finally, create a grandchild stream called “MyPrototype” to track changes that will be discarded or retained depending on the level of success.

The prototype development activity is committed, shareable, integrated, and yet cleanly segregated from both active feature and private development.

The “Feature” stream is the collection point of all in-progress development activities for the given feature from the entire team.  Developers will promote here frequently possibly kicking off an automated build with success/fail notification.   The “MyDevelopment” stream provides a collection point for all of your personal development changes.  This stream may be considered a “private” development stream simply because no other developers will likely use it – set a stream lock to be sure.  A lightweight Continuous Integration build (i.e. compilation only) may be performed on “MyDevelopment” for sanity sake or just compile and promote as a practice.  The “MyPrototype” stream is a collection point for all prototype changes. Even as new changes are promoted to “Feature” and “MyDevelopment”, the “MyPrototype” stream will automatically incorporate those changes (via inheritance) and the prototype developers will merge changes as necessary. The prototype development activity is committed, shareable, integrated, and yet cleanly segregated from both active feature and personal development. Also, by using a stream for prototype work, multiple developers can contribute and collaborate. If the prototyping work is deemed successful, the files can be promoted to “MyDevelopment”. If the prototyping efforts don’t work out – no problem – just remove the “MyPrototype” stream and re-purpose the workspaces.

The beauty of this pattern is that it isolates development activity by purpose without co-mingling physical file changes.   It lets the prototype developers go nuts and shoot from the hip while the regular feature developers (with the deadline!) work unimpeded and without fear of rampant changes — and everyone stays up-to-date.   And with no limit to stream depth, teams can perform prototyping efforts in parallel and/or perform prototyping based on existing prototypes by adding another child stream!  Furthermore, the pattern works for any size development activity or team, even for us Team-of-One developers with tons of ideas, fast fingers, and a few green screen x-terms (2 space, 80-char wrapping of course – <chuckle>).

This is a perfect example of how AccuRev empowers the developer to take control of their own development.  Creating streams is extremely easy and the stream browser provides unprecedented visibility into the entire development process.  With the right amount of security in place (Locks, ACLs, Triggers), the critical release streams (left side of the stream structure) can be locked down by the CM Admins, but the development related streams (right side) can be fair game for developers to create an environment that suits their purpose, such as prototyping.

Does anyone have a good story to tell about how this pattern (or equivalent) helped with a major refactoring effort or library upgrade?

/happy prototyping/ – dave