Classic software engineering mistakes: To Greenfield or Refactor Legacy code?

Oh, the humanity!


If we were not in the software engineering game but, rather, in the civil engineering game then the equivalent of this article would be called 'Classic Civil Engineering Mistakes' and contain graphic videos of buildings and bridges collapsing and thousands of people running for their lives.

It's hard to get emotional about something you can't see

Unfortunately software is a rather intangible asset that doesn't lend itself to dramatic and emotional visualizations. When it comes down to it the life work of a software engineer, when reduced to it's lowest conceptual level, is the specific arrangement of sequences of 1's and 0's on one or more hard disks (or some form of solid state storage) residing on a server or desktop computer. How those 1's and 0's get there and 'which' sequences work and which don't is not really a black art but it definitely requires some intelligence but surprisingly a lot more common sense.

Some arrangements of 1's and 0's form social networking services like Facebook and Twitter, earning their creators billions of dollars. Other arrangements are from lesser known companies that, while not making billionaires out of company owners and millions for employees, can provide sufficient income to keep families fed and mortgages paid.

Other arrangements of 1's and 0's never earn any money at all. Still others begin successfully for their creators but through circumstance, changing environment or poor decisions the success teeters off either abruptly or over an extended period.

Opportunity knocks

At certain times in a software company's life new market opportunities, constraints or internal political rumblings arise in relation to one or more products produced by the company that demand analysis of various possible strategies going forward. Often the intended end result is an extension or enhancement or refinement of an existing product. The evaluation of a way forward usually involves choosing between one of the following basic strategies:
  1. Creating a new greenfield code base for either all functionality or a portion of it.
  2. Retain a single code base for legacy and new products, refactoring, enhancing, refining the legacy code base where required to provide the extra functionality required by the new product. The same code base can execute in multiple 'contexts' (i.e. products) via well established context dependent coding mechanisms (eg., standard factory patterns, template pattern etc.,) where certain behaviors are abstracted and context/product specific implementations are 'plugged in' at run time.
  3. Copy/paste the legacy code base, creating a completely independent code base for the new product such that there is no requirement to support the legacy system in the duplicated code base - support for the legacy system will be provided via the original legacy code base.
  4. A hybrid of 1 plus either 2 or 3 where new functionality can be sufficiently decoupled that it can be developed as greenfield project but it integrates with the legacy system (2) or a copy of it (3) through well defined interfaces - some of which may not exist yet but can be added, usually with some fairly straightforward refactoring and abstraction of certain functionality within the legacy system.
This article relates my experience during an early software engineering position I held 20+ years ago when I was bright eyed and bushy tailed and very naive (I like to believe that I'm still bright eyed and bushy tailed but no longer so naive!)

Some seemingly plausible but, in the end, damaging strategic decisions were made in regard to the way the software team arranged their 1's and 0's as you'll see...

The Company

The company I worked for produced software to run in sophisticated controllers for at least three different products. Let’s just call them ‘engines’ so as to not make the company identifiable. They all supported very similar functionality but varied in the hardware that they were deployed to so the software for the three different engines was built from the same code base – in places where engine specific code was required we used C’s pre-processor macros to customize the code for the different engines (this was in the infant days of OO and not many people had heard about C++ - which would have been a more elegant solution) 

#ifdef ENGINE1

#elif ENGINE2

#else // All other engines
...

#endif

Example of an engine specific section of code

99% of the code didn’t need this as it was common to all products but some of the source files had it and while inelegant, it meant we only had to maintain a single code base. Any bugs or features added to the common code for one product was automatically inherited, at no extra cost or effort, by the other products. It was very maintainable, highly productive and made a lot of sense.

but... one day a (insert adjective that starts with 'n') software team leader decided it would be much 'cleaner' if we separated the code into three separate code bases and remove all the pre-processor macros. So without any further discussion and much to the amazement of some of the more astute software engineers, he split the code into 3 separate code bases even though 99% of it was common to all engines.

The theory was that by removing the ‘ugly’ engine specifics from the code we’d all be working on ‘cleaner’ code and therefore productivity would go through the roof.

That was the beginning of the end for the software department in a historically successful, medium sized enterprise. I won’t mention the name because I don’t want to offend anyone who was involved in the decisions that pre-empted the escalation in costs that contributed to the eventual closure of its software department.

[Aside: very few of the pre-processor directives were ever removed from the code bases so the perceived 'ugliness' was never removed - we just had 3 separate versions of it :) ]

Three separate code bases: What could possibly go wrong?


In the beginning the three code bases were still fairly identical so deltas and merges were almost manageable but before too long subtle differences meant that simple copy/pastes from one source base to another was almost guaranteed to cause regression bugs. In sections of the code that were previously purely generic (i.e. no engine specifics) bugs were being raised by customers for one product that were not present in the other. It quickly became a dog’s breakfast that was very expensive to maintain.

Within months a hiring spree ensued because now when a bug was found it had to be fixed in three places instead of one and when new features were added it had to be added to three files instead of one - the team needed more developers to perform this laborious 'duplication of effort'. As a developer I recall suffering the classic 'same but different' problem that occurs when you work in different instances of essentially the same thing - there is very little contextual difference as things look so similar so, a few weeks later, remembering which 'instances' you fixed and which one you didn't becomes very challenging.

"Same but different" can be dangerous to your health

There was once a German car racing track that relied on this 'same but different' confusion to trick drivers. It had multiple bends that were deliberately designed to look 'exactly the same' and they were essentially the same except for the last one, the exit of which went in a completely different direction to the rest. You can guess what happened - they had to close the track from too many driver deaths as many drivers turned the wrong way on exiting from the last bend - confusing it with another that went the opposite way. Due to the confusion caused by the 'same but different' phenomenon their brains lost track of which instance of the 'same thing' they were driving in.


Anyway, enough about real things like cars and racing tracks and back to 1's and 0's...

Many more developer hours were required to maintain three code bases than the previous single code base – which makes perfect sense to the average man in the street even though the people initially pushing for the ‘cleaner’ code base(s) found this outcome *surprising*. Nevertheless they arrogantly continued with this 'new' approach to how we arranged our 1's and 0's because to go back would mean an admission that they were wrong and for 'n' type people that is very hard to do. One of the guys leading the team was one of those personality types that would rather run the ship into an iceberg rather than admit they had the ship pointing in the wrong direction.

Pretty soon the quality of all three code bases dropped dramatically as the maintenance effort was overwhelming even the larger team that had been established. The products with the least market share started being treated like second rate citizens with all effort going into the most popular products. Previously, updates in the common code (99% of it) were occurring for the less popular products for free, now they weren’t happening very much at all as it became too expensive and time consuming to port new features and bug fixes across from one code base to another.

The company had effectively close to tripled the cost of maintaining and enhancing its cash cow’s software code base without a single extra dollar in revenue coming in for all that extra effort and cost. Customer's don't really care if you've got pre-processor macros to implement conditional builds in your code so long as the code works.

Rather than realizing the expected, yet completely unquantified, productivity gains from establishing multiple, but ‘cleaner’, code bases the productivity dropped dramatically as did the quality of released products. The sales team were constantly asking “Why is bug X fixed in product Y but is still there in product Z? I promised the customers it would be fixed in the latest release and it's not. You have embarrassed me.” and customer satisfaction dropped accordingly.

This plane’s altitude was dropping quickly but it still had two of the four engines running and could have limped to the nearest airport if they were smart about it but only the bravest and humblest manager would be able to make the smart and responsible decision required at this stage in the game…evidently that type of person didn't make it to the second round interviews at this place ;)

New 'clean' greenfield project: What could possibly go wrong?


What happened next was a very exciting opportunity offered to many of the software engineers and therefore well accepted by all the software team (including me!) – a new greenfield project called the ‘re-engineering project’ (I’m sure thousands of companies have had projects with re-engineering in their title so I’m not giving away any company identification hints here) which, in recognition of the set of 3 separate unmanageable code bases was to rebuild a new “re-engineered’’ single code base from which the software for all products could be produced. Common, reusable frameworks and libaries would be built. It was a very ‘back to the future’ experience – creating a new single codebase – much like we had before the code split except that all the code would be shiny and new and therefore much, much better!

The ‘re-engineering project’ started with great enthusiasm and it was decided early on to avoid using any of that yucky, dirty existing code that had undergone years of development and field proven robustness.

Initially about 18 engineers, including myself were put on the project. It would use the latest 'state of the art' CASE tools whose licensing cost thousands of dollars per engineer - not to mention the cost of training. Everyone was very positive and keen to "get into it". What could possibly go wrong?

Well there was one minor problem - the project would take 12-18 months and would not earn 1 cent in revenue until it had been completed, fully tested and customers started buying it. Meanwhile there was a "dog’s breakfast" of 3 difference code bases that needed to be maintained and the ‘re-engineering project’ just stole about 18 engineers from the maintenance effort. Whoops!

As you may have guessed, in a company with limited financial resources, engineers started getting poached from the re-engineering project and put back on maintaining the old code for the cash cow products. When we were down to about 6 engineers I remember thinking, “Ok, this project’s ETA has just been extended by about another 24 months”. Unfortunately that wasn’t the end of the poaching.

The re-engineering earned ZERO dollars and was of intangible benefit in management’s eyes. Maintenance and quality of the existing products, which WERE EARNING MONEY was dropping as less engineers were available to maintain them. Over the next six months management gave directions to R&D to poach even more engineers back to maintenance to improve quality and turn around times for new features. This continued until there was only a single person left on the re-engineering project…me. While rather chuffed to be the last one standing it was pretty damn lonely and the ETA had now become a company joke: 18 engineers = 18 months so 1 engineer = 324 months!

So what can software managers learn from these types of experiences to make better decisions for how their team organize their 1's and 0's? Well it depends on a number of variables. What may be surprising to some is that a very important variable is a non technical one: the size and therefore financial resources of the company. Here's a list of typical outcomes based on company size:

The little guys

The little guys, small enterprises, rarely have a choice. Their decision is constrained by the economic reality of very limited financial resources and they usually don't have the financial resources for a full greenfield 'rewrite' of their code base. Often their only option is to 'get agile': continually improve, refactor and enhance their existing code base. Over time much ugliness in the code can be replaced by elegant design. The improvements are incremental over time instead of 'big bang' at the end of a 2 year project development - which a small company simply can't afford.

As turn around times for each iteration is usually from 1-4 weeks the improvements to the existing code base are regular and financial benefits can be realized in the very short term. In other words, agile style, incremental improvement of the legacy code base produces a repetitive, financially viable sequence of deliverables with excellent 'return on investment' (ROI).

In my experience this is usually successful as it does not incur large financial commitments on behalf of the small enterprise. My second place of employment was a small but successful company. I wrote a sophisticated Windows application for them in the early 90s. It was ported from 16 to 32 bit when Windows 95 was released and they are still deploying it 20 years later and have continued enhancing and maintaining it over two decades.

My own company is relatively small yet we have developed many reusable frameworks over many years and we have incrementally enhanced and refactored these on a continuing basis. We have never hit any requirement that has caused us to even come close to being tempted to 'ditch it and start with a clean slate'. It certainly helps to design with flexibility built in - often abstracting behaviors into clearly defined interfaces to reduce coupling and increase re-usability - but even with an old code base that wasn't built with this is mind it is usually possible to refactor to repackage legacy code into an implementation of an interface that then allows that behavior to be reused by other modules or even expose that behavior as a service to be reused by other applications. It also allows other modules or applications to provide their own implementation of that interface. The trick for knowing when an opportunity for such abstraction exists is to be ever vigilant against the temptation to copy/paste. If you ever feel the need to copy/paste a section of code then right there you probably have a candidate for some abstraction and re-usability.

The big guys

The big guys, large enterprises typically do succeed with large, 'big bang', greenfield projects. If you’re a Microsoft or Google you typically have enough resources that you can do all the extra hiring you need and don’t need to pull staff off one project to start another and so there’s no need to poach people from the Greenfield/Re-engineering project and put them back on maintenance of the company's existing cash cow product(s) when the legacy code suffers from neglect. This approach can work if you happen to be a Fortune 50 company with infinite funds to invest in product development and so don’t need to poach resources from the Greenfield project back to maintenance roles.

The medium sized guys

Stuck between little guys and big guys is the medium sized enterprises. This is where things get very tricky. They are not like the little guys whose decision is forced upon them due to 'shoe string' budgets but equally they are not like the big guys whose seemingly infinite budgets mean they can fund even the largest of projects.

These guys often do have budgets that can run into multiple millions but lets be clear: millions are not billions like the big guys deal with. Millions can sound like a lot of money but if an average software engineer is paid around $100k/year and you create a new project that needs 8 engineers you've just incurred an extra ~$1,000,000 of costs every year once you include rental space, furniture, benefits, pension contributions, training etc., Has revenue increased by that same amount over that year? Is it expected to increase once new customers use products built with the 'new and shiny' code base or won't most customers notice the difference?

Some managers justify the greenfield approach by saying we're going to have no new hires or only one or two and so the overall cost to the company is not significant. If you need 8 engineers but hire only two extra engineers then you have to pull 6 engineers from an existing product, typically the company cash cow and that has all the risks I outlined above. Poaching is the main risk - every poaching pushes out the ETA of the greenfield project and reduces it's chance of success as the ETA approches the 'Greenfield Window'.

Greenfield Window

I use the term "Greenfield Window" to describe the very limited window in which a greenfield project has to succeed. One that window closes it's game over for the Greenfield project. All the code created is archived in the version control system and rarely ever touched again.

Greenfield projects take lots of developer and financial resources and they don't earn the company ANY revenue until the product is being used by paying customers. For even modest projects this can take 12 months but in larger products this can take many years.

During greenfield development there are all sorts of pressures placed on the project. Pressures from accountants, management, creditors and shareholders who ask questions like "how much is this costing us and how much has it earned us?" Until the project is completed and the first customer pays you something the answers to this question are always "A LOT & ZERO" which really tests the patience of these stakeholders. It only takes one of these groups to lose patience and the 'Greenfield Window' slams shut very abruptly - on your fingers because you were gazing out the window at the time, looking at the greenfield and wondering when your 'crop' would be ready for harvest.

This is a very painful experience both financially and emotionally. In addition to thousands or millions of dollars wasted egos are hurt and egg lands firmly on faces. Morale among the team can be affected and office politics and power-plays take on a new vigor as the dust starts to settle under the mushroom cloud. It's the injuries sustained by egos and subsequent political game playing that can have long lasting effects.

Greenfield Specification Vacuum

Often legacy code has evolved over such a long time and in a relatively undisciplined way that much of the functionality is not documented anywhere. The code has become the specification. Hundreds of boundary cases and error conditions have been 'handled' by the code without any documentation of such behavior. Process/workflows are 'embedded' deep into hard coded state machines that make decisions based on a set of business rules that are perhaps known only to the original developer or a business analyst who has long since left and often for situations that arose in real life scenarios that only they were aware of.

Throwing out legacy code and starting with a clean slate often throws the proverbial baby out with the bathwater.

Most greenfield projects that are designed to replace a legacy code base have no accurate functional specification to work with. Even if one were to be produced it's coverage of business rules, boundary conditions and exceptions that were implicitly, explicitly or even accidentally built into the legacy code would be severely lacking. A thorough examination of vast quantity of code would be required to establish a functional specification with sufficient coverage before the greenfield project can even commence. This is another advantage that refactoring the legacy base has over a greenfield development: the legacy code contains years of valuable explicit, implicit and accidental functional implementations and, even though they may be subtle, undocumented and hard to find, they are working and you can reuse them as they are because you don't have to throw them away and reimplement them.

With incremental improvement/refactoring you can attack these a section at a time over an extended time with each iteration producing better code that customers can start using after the usual QA process has completed. With a greenfield approach you will need to discover, document then reimplement in the new code every undocumented, hidden, subtle business rule and specification. I believe that the calculation of effort involved in this task will grossly underestimated by most software engineers overwhelmed with excitement at the prospect of starting work on a greenfield project.

In fact, if the engineer believes that their estimation of effort to create a functional specification via thorough examination of legacy source code is so large that it will make the project too expensive and therefore jeopardize it's chance of commencement there may be a temptation to deliberately underestimate the effort and thus create a 'surprising' cost blowout down the track. I most carefully used the word "tempation" here:  whether an individual succumbs to this temptation or not will depend on their own moral code.

With the nature of typical full time employment arrangements there is rarely any direct financial bonus or penalty associated with accurate or otherwise estimation and/or implementation of functionality within the estimated range. The company takes all the risk for decisions made based on estimates provided by people who have no direct exposure to financial consequences related to the accuracy of such estimates.

Greenfield Partial Functionality

One of the "promises" (given without any justification or quantization) raised by promoters of greenfield projects is that the greenfield system will be "all singing and dancing", "infinitely cool and good" and of course be able to replace the existing system but, like politicians, software engineers are great at making promises but not always able to follow through - especially full time employees for which there is usually no direct financial benefit or penalty for delivering on their promises.

Let's say that the software team, in the face of overwhelming odds against it, manage to somehow create a working new greenfield product that does something like the old system did. Let's just say that the first deployment of the new system was for a slightly different market that only needed a subset of the functionality in the legacy system. What you have now is 'greenfield partial functionality'.

The greenfield project has created a new source code base that needs a team of engineers to support it but guess what - it only represents a subset of the functionality so that it can't replace the legacy system. In supporting only a subset of the functionality it is hard to find new customers to use the new software. The legacy system and the new system both need continued development and maintenance. The yearly costs to the company of the software department have now doubled and because the greenfield project has only partially implemented the features of the old system there is no medium term reduction in developer costs. The company now has two different software code bases to maintain for the long term.

Let's be really generous and say that the new project does implement every feature of the legacy system. We're still not in software nirvana because the new system, while it may have been based on the original class model or database tables has had quite a number of new attributes/columns added or undergone refactors to improve problems experienced with the old model/database. In other words that 300GB database that represents your customer's data can't be easily migrated to work with the new software.

It's like a 20 ton anchor that isn't budging until someone can write the "migration script from hell" to bring the old data into the new system. On the contrary, if the legacy system had been incrementally refactored then at each iteration, the data migration (if required) would likely incur very small model/database changes and so very small, manageable data migration scripts are required to migrate the existing data and hence all the users associated with that data.

Greenfield Paradox

What I have discussed here is the 'greenfield paradox'. It's a paradox because intuitively, nice clean, shiny things are much nicer to deal with and therefore easier to look after and maintain than old, dirty, ugly things. That might be true of cars, boats and houses but it's not true of software. Software doesn't rust or corode. Unlike plastic and wooden decking it doesn't break down in the sun over time.

Software is not like the 'normal' things we touch and feel in everyday life. It can be almost 'immortal' unless the environment in which it exists becomes extinct (Good bye DEC VAX we knew you well!). Software written to run in some banks or insurance environments 40 years ago can still run exactly as it ran back then - and some still does! The 1's and 0's don't corode or rust with time (ok, maybe metal 'core' based memory from the 60's might =]). Software is infinitely mutable. It can be bent, reshaped, refactored, re-engineered. It can be changed a little at a time and therefore it can be incrementally improved over time whilst maintaining the current user base and all of their data.

Sometimes there are paradigm shifts in the IT world which force a new greenfield development - for example moving an application from desktop based to cloud based - but if the legacy software was written with a decent model then even though much of the user interface code will need to be rewritten most of the 'core' of the application can be reused so long as it was written in a language that has wide adoption (eg., Java, C++ etc.,).

Hybrid Strategy

In some scenarios the extra functionality is sufficiently independent that only the new functionality needs to be developed as a greenfield project. A significant portion of the functionality can still be provided by the legacy code (strategy 2, preferrable) or a copy of it (strategy 3, duplication of maintenance effort and cost). The integration between the greenfield component and the legacy software can be via intelligently crafted services/interfaces that are provided by the legacy code base. While the hybrid strategy is not the pure 'clean slate' approach that many engineers might prefer it can result in massive savings and much faster time to market compared to a complete greenfield rebuild.

The time and effort involved in refactoring certain parts of the legacy code to abstract behaviors and expose them as services via generic interfaces is usually insignificant compared with the time and effort required to create a complete greenfield implementation of the legacy code. If a hybrid strategy of 1 + 2, above, is chosen over hybrid strategy 1 + 3, it means that the company's code base is not duplicated which avoids a doubling of the company's maintenance effort and cost.

Refactoring existing functionality so that it can be exposed via a clean interface requires patience, intelligent thought, design and high coverage testing. Not every software engineer has the necessary skill set and, just as important, personality and temperament to be up to this task which is why many will push for a 'complete greenfield' strategy. However, if you have one or two engineers who have the right technical skills and personal skills required to implement the intelligent refactoring required to implement a highly efficient hybrid strategy it could represent savings of thousands or even millions of dollars and massively speed up the time to market of your new product. As an added bonus: any improvements made to your legacy code base are instantly available to your legacy products with no further coding effort/expense.

Let the software engineers decide?

If you ask 100 software developers whether they would rather work on legacy code written by someone else or create brand new code from scratch then 100/100 software engineers would say 'brand new code, of course!'.

My 20+ year’s of industry observation tells me that it takes a very brave and very smart software manager to avoid the mistake of undervaluing the legacy code base. While they can see the potential for a major cost blowout in the construction of a new 'bright and shiny thing' they often find it hard to resist the overwhelming pressure from software engineers and others in the organization to 'restart with a clean slate' and the promises of 'future productivity improvements' that are 'guaranteed' by said software engineers. The guarantees are never in writing and the gains are never quantified.

Software engineers hate nothing more than OPC (Other People's Code). A greenfield project or a 'cut and paste' of the legacy code base with no requirements for backwards compatibility represents an extremely tempting though expensive opportunity for them to create an extra code base that software engineers that come after them can have the pleasure of hating for years to come :)

If ensuring popularity with or appeasing the software engineers is a manager's main goal then they will be tempted to decide to ditch or 'duplicate and change' a legacy code base rather than incrementally improve the legacy code base - and they would be among thousands of managers before them who have made that same decision. How many of those software teams are still around to tell the tale?

Strategies for refactoring legacy code

Incremental

The secret in refactoring existing code is that you want 'bang for buck'. In other words you want a great return for the smallest investment. This is the key to the success of the refactor strategy - you only change what you need to change now. The rest can wait till later. Over time the entire application may end up being refactored and enhanced. That's great but the secret is to make it awesome over time with incremental changes - they are much more manageable and testing their effect and fixing side effects is much easier and indeed more viable than a 'big bang' approach that touches lots of different parts of the code.

Database Access - Persistence

An immediate assumption by people who want to ditch the legacy code base and move onto a new or duplicated version of it is that it is impossible to refactor the old code base. It might use a database access layer that uses raw JDBC/ODBC or it may have no database layer at all: SQL statements appear littered throughout the code. One of the assumptions is that to move to an Object Relational Mapper (ORM) to gain the productivity benefits that they bring requires a new database schema - this is clearly not the case. Most ORM implementations in Java like Hibernate (JPA), DataNucleus (JPA/JDO) and others allow you to map to existing legacy database schemas. Each time you refactor a class to be persisted by an ORM instead of using JDBC/ODBC and SQL directly you actually decrease the size and maintenance cost of your code base. Most ORMs will play nicely within a system that accesses objects both through the ORM and through direct JDBC. This means that it's possible to migrate your code base to a new ORM oriented version in a piecemeal way over time - you don't have to convert the entire application over to ORM access before a new production release can be deployed.

Multiple Products - single code base

We touched on this earlier: it is desirable to have a single code base for the generation of product variants that are essentially very similar in function but might differ in titles, terminology, internationalization, color schemes etc.,
Clever use of patterns like the factory and template patterns allows code that used to include hard coded screen display or functional behavior can be refactored to deal instead with an abstracted interface. A product specific implementation of the interface is 'plugged in' to the application during configuration. New variants of the product can be created very easily by simply providing an new, product specific implementation of the required interfaces and setting up appropriate configuration files to 'plug them in'.

Changing the User Interface framework

A driving force for many a greenfield project is that the old user interface framework is technically 'extinct' like the dinosaurs or no longer 'cool' and much newer 'funkier' ones are calling. Ideally, regardless of the legacy UI framework there would have been a well disciplined MVC strategy applied or at least a good separation between the underlying model and the user interface code. I say ideally but I do understand that the architecture of some UI frameworks (eg., JSP) often encouraged the inclusion of model and business code right into the UI elements. If you fell for this trap then you need to slap yourself in the face repeatedly with a wet fish!

The evils of mixing UI and model or business code have been dealt with many times before and I know you have learned your lesson the hard way so I don't need to mention it again ... OK! What I will say, is that if you are starting with legacy code that suffers from this a good first step is to incrementally start a process of UI/Model separation. Create some separate model classes if you haven't already and start shifting model code, any JDBC/ODBC code and any SQL 'glue code' from your UI elements into those model classes. After some pure model classes have been established you can attempt to incorporate a more modern UI framework by targeting the UI elements that only require model objects that have already been 'purified'.

If on the other hand your legacy code already has Model/UI separation then your task is much easier. If not already the case, then move the model into it's own package and move the UI code into a separate package.

eg.,

Package:
com.acme.myapp.onebigblob

becomes:
com.acme.myapp.ui.jsp  <- Any supporing JSP classes
com.acme.myapp.model   <- Pure POJO model classes

Once that is complete you can start creating UI elements using your more modern UI framework (Wicket, Tapestry, JSF etc.,). In most cases these should be able to coexist in the one code base (we've done this with Echo and Wicket). The important thing is that while in transition both the legacy UI code and the new UI code both use exactly the same model classes - this is why we need complete separation of UI and model code. The model code must have no references whatsoever to any classes from either the legacy or the new UI frameworks. Create a separate package for the new UI framework (eg., Wicket). All classes related to that UI framework go into that package.

eg.,

Package:
com.acme.myapp.ui.wicket  <- All Wicket page/panel classes

If your app is non trivial then you should create subpackages under wicket to organize individual logically separate areas of functionality into their own package.


eg.,

Package:
com.acme.myapp.ui.wicket
com.acme.myapp.ui.wicket.user  <- Login form, account preferences
com.acme.myapp.ui.wicket.borrow <- Book borrowing and related forms 



Together we stand, divided we fall

Divided we greenfield

Choosing to perform a new greenfield implementation of the legacy code divides by creating two separate teams. One consequence of creating a new greenfield project with a new software team is that it can create an 'us and them' scenario where the developers left on the legacy code team can feel like second rate citizens as they have been relegated to the relatively unexciting role of maintaining old code that uses a bunch of old technologies that no longer have the developer 'street cred' that they used to.

The developers on the greenfield project get to experience all the 'thrill and excitement' of skilling up and using all the latest and greatest technologies and the benefits don't stop there - their resumes undergo a potentially very useful enrichment.

Splitting the team into legacy and greenfield can cause divisions that can affect morale of developers 'stuck' on the legacy team - especially the aspirational developers who want to improve their skill set and who want to help the company improve and enhance its software assets.

In some teams there are developers who are happy to merely clock in each day and aren't at all interested in the challenges of learning to use new, exciting and usually, much more productive, technologies. The consequence of having these types of developers on any team won't be covered here: it would require a completely separate article to discuss issues related to that.

Together we refactor


A decision to perform a separate greenfield development will definitely create two distinct teams and can expose the company to the issues discussed above. On the other hand, a decision to refactor and enhance the existing code base as a unified team can create positive outcomes for everyone involved. The 'buzz' created by the challenge of adding new features and incorporating newer tools and technologies affects everyone on the team. Eventually all developers will gradually be exposed to the new technologies so no one is being left at the station as the technology train departs - everyone goes along for the ride. When the 'together we refactor' strategy is taken we remove the problem of developers on the legacy software feeling rejected or 'left behind'.

Tough call

In this article I have related my two decade old, real world experience of the consequences of choosing 'code base duplication' and 'greenfield development' over refactoring and improvement of an existing code base. Hopefully this has been informative enough for software managers to help them understand that making a decision that is popular with their software engineers is not the same as making decisions, however unpopular, that can drastically affect the financial viability of the software team in which those same engineers work.

Unless you're a Microsoft or Google with unlimited financial resources to spend on a strategy that costs a lot of money and does not give any return in the short or medium term then you really need to make decisions that avoid major increases in the size and/or number of your software code bases because creating and maintaining software is not at all cheap. Choosing the correct strategy for managing the production of your company's 1's and 0's is a tough call.

Share your experiences...

Please let us know of your experiences with software teams that had to make this crucial decision. You don't have to mention the company name but please let us know whether it was a small, medium or large enterprise and the decision they made and whether it was as success or not - i.e. did it make it before the greenfield window closed?

Comments

  1. Excellent article. Well written and just enough technical and layman terms to be readable and understandable.

    ReplyDelete
  2. Thank you, Chris for this very informative and realistic article.

    There is just one part where I thought additional input would be beneficial. This is about the estimates you said are never put on paper. I have been both a Sr developer and a manager, and I can tell you from experience that realistic developer estimates are never approved. What your manager wants is NOT your realistic estimate but a ballpark figure that s/he will then reduce by about 30%. When the dust settles, the developer ends up with his manager's "desired" (I usually call it wishful thinking) estimates that themselves are based on the budget allocated to that manager. From what I have seen over 15 years of experience, this is the case in 9 out of 10 times.
    I have spoken to many Sr developers over the years and all of them, without exception, would estimate very conservatively - usually by doubling their initial estimate, allowing for fixing environment problems, etc. I asked them why would they go so far as to double an original estimate. The answer usually is, well, because in addition to my work on a greenfield project I have to:
    1. Mentor Jr developers - recent hires or people rising through the ranks
    2. Talk to architects, managers, and other stakeholders who by "virtue" of being on top of you decide they can just take as much of your time as they want, so that they can have you in their meetings and make sure all of their project documentation questions are answered.
    3. Give status updates every day - again, talk to manager, update Jira tickets, and tell BAs to update documentation.
    Because of the above, my actual time dedicated to development is cut by half.
    In all fairness such a situation is typical only of medium and large companies. In small companies there simply aren't that many managers and even the ones already there have to do part of the development work on a greenfield project. So, they are intimately aware of all the details and have neither the time nor the desire to hold 2-hour meetings.

    ReplyDelete
    Replies
    1. It seems an all to common problem!

      You are right about it being limited to medium and large companies - they have tantalizing budgets that the small companies just don't have so the small guys, by a fluke of nature, MUST choose the most financially sane option every time. It's a lesson companies with bigger budgets could learn from.

      Delete
  3. Excellent post. You just got yourself a brand new reader:) Will be viewing other articles of yours.

    ReplyDelete
  4. A very good article. Thank you.

    ReplyDelete
  5. Great article which comes very close to what I experienced in my first job about 20 years ago (when I was bright eyed and bushy tailed and very naive). Unfortunately it also comes close to what I experience in my current job again (with variofocal glasses and the forehead growing towards neck).
    The only bad thing I can say about your article is that it is too lengthy for my bosses to take their time and read it.
    Also they would consider my concerns biased, since I am the software project lead of the legacy cash cow that is being sacrificed on the greenfield altar. Hardly any chance to make my point ...

    ReplyDelete
    Replies
    1. Yes, unfortunately too many cash cows are sacrificed on the greenfield alter.

      I will say though, it does take more intelligent types to refactor existing code. Average developers/cookie cutter types can hack out and copy/paste their way to a woolly mammoth sized greenfield code base in a couple of years. It will be 'bowl of spaghetti' type application that is a maintenance nightmare and not do half of what the cash cow legacy system does but there will be management types who will always argue for this horribly inferior solution because it's "new".
      In reality a much smaller team of smarter devs could take the legacy app to the moon with 1/20th the budget spent by the cookie cutter team.

      Delete
  6. Well done! Thank you so much for sharing this with us, this is too much helpful for me to understand Microsoft Green field projects. Keep sharing this type of valuable stuff for us.

    ReplyDelete

Post a Comment

Please add your comment:

Popular posts from this blog

Java package name structure and organization - best practice and conventions

Git and Subversion - the working directory is fundamentally different