Posted by shrikant.joshi on February 25th, 2006
Name: Distributed development disaster
Also known as: Geographical Gotcha
Most frequent scale: Enterprise
Refactored solution name: Game plan or Dahi-handi
Refactored solution type: Process, Management
Root causes: Development infrastructure, Time zone, Cultural difference, Unclear responsibilities
Unbalanced forces: Management of complexity,Management of IT resources
Anecdotal evidence: “Why on earth are they rewriting the same algorithm?”, “If we add a few more people to our group, we will be able to pull this through”, “I have no idea what Srini’s group is doing?”, “Damn, I have to wait till tomorrow to sync up the repository”, “cvs update is taking forever…I better go home and come back tomorrow.”, “How come nobody is in the office today?”, "What is Diwali?".
Background
Geographically distributed development is a reality in medium to large enterprises. With the advancements in communication technology and bandwidth, a large number of companies are dealing with multisite groups. The biggest problem in this is often people and process related rather than technology. A lot of these problems are also observed in collocated teams with silos, but they take a front seat in distributed teams.
Imagine a soccer team without a game plan and without assigned responsibilities where everybody is playing offence and defense at the same time! When the whole team is running after the ball, it is bound to lose regardless of talent on the team. On the other hand, a team with a well thought game plan optimizes the skills of every player on the team. In this case, the team potential is more than the sum of individual player’s talents. The coach/manager should assign each player to the position that best suits him and the team. Each player should know what his responsibilities are and should be able to communicate well with teammates.
Dahi-handi is a fun festive sport where a team of people forms a high tower by balancing on each other’s shoulders to reach a large pot full of yogurt. If people are not synchronized, the human tower falls to the ground in no time. On the other hand a well-rehearsed and coordinated teamwork can reach towering height to the pot and is a delight to watch. Well, I can’t imagine how a distributed dahi-handi will look like, may be using a PS2 with networked players!
General Form
The challenge in handling a multi-site group is multi-fold. First of all, the architecture and design of the system should be modular and partitioned properly. If the changes that one of the subgroups is doing affect other modules, there needs to be a closer interaction. If the interfaces are not clearly defined, the results lead to chaos and non-productive environment. Writing a comprehensive and automated suite of unit tests is invaluable. Complete and precise requirements document, design documents and review process at each stage is extremely important. An important observation to remember is that the dev. Process should foster independent thinking and not sweat the small stuff. Controlling at micro level often does not work.
Source code management is another important factor which affects this scenario. Multiple solutions have been employed to tackle this front in the form of mirroring, synchronizing, master-slave conf., module separation etc. Issues like tuning synchronization frequency and master-slave need to be handled depending on the coupling between multi-site groups. A fine tuned build process for distributed teams should reduce the amount of downloads (typically of common libraries) during each build.
The biggest problems have been reported on ‘Personal’ fronts rather than on technology front. The teams located at different places need to feel like they are part of the same group. Water cooler gossip needs to flow back and forth. Tele-conferencing, e-mails and periodic phone conversations help but nothing builds the bond like ‘face time’. Co-workers should visit on a reasonable basis to get acquainted with other team members. These visits should not be taken to the extremes so as to become a burden.
Symptoms and consequences
- No or bad communication between groups working at different sites
- No reuse at design or code level
- Incompatibility between modules
- Extensions in the system lead to multiple changes across all groups
- No clearly defined interfaces
- Build breaks which get fixed after a very long time
- Lost changes in source code repository
- Merge problems due to ineffective sync. cycles
- Poor knowledge of modules that are owned by other sites
- Slow learning curve during new employee training
- Unacceptable network bandwidth
- Blame game
- Slow turnaround for customer issues
Refactored solution
There are at least following dimensions to the solution:
- Architecture and design of the product: A truly modular and extensible architecture usually leads to clear separation of modules and responsibilities. Proper use of encapsulations, interfaces, facades, inheritance hierarchies and usage of design patterns in general are of immense value.
- Development process: Architectural responsibilities should be assigned so decisions can be made locally. The process should also promote design and code reviews. Coding standards make life easier not only for developers but code maintainers. We should figure out what works for our situation so that there is no unnecessary overhead of red-tape processes while the product is managed with control and direction. Ownership per deliverable is another way to handle this problem.
- Automated test suite: The value of an automated unit test suite increases multifold when a team is distributed. The team needs to make sure that all the tests pass at any time from a public build. Mock objects based testing helps the development process when parallel teams are working at different velocity.
- Communication between diverse groups: Mailing lists, informal dialogues, group decision making, unity of purpose, absorbing and enjoying cultural diversity, on-site visits, briefings, design documents, teleconferencing are various tools for success in this area. An important point is to minimize unnecessary communication while maximizing understanding. This is similar to partitioning an application across network processors.
- Proper use of code management system: There are a lot of different approaches employed by various organizations. We need to find out what works best for us. Mirroring and master-slave configurations are widely used. There are positive and negative sides for these approaches. Various source code control systems support distributed development in different ways. ClearCase has an add-on unit called ‘Multi-site’. CVS, Perforce have also been employed by a lot of companies. The important thing for us is to keep it simple and workable.
- Time overlap: It is very desirable to have at least a few hours of overlap time for the work hours. In cases where the teams are separated by brutal time differences, like California and India, we need to come up with creative ways for an overlap. E.g team in California can work 6:30 am to 3:30 pm while India team can work from noon to 9pm. That way an overlap of at least an hour or two is availed. Weekly mmetings can clear a lot of doubts and bring the team on the same page.
- Hierarchy: In general, flatter teams work best. But, in certain situations, module leaders can contribute and ease the pain of day to day management. I will try and address this issue some other time.
Example
A lot of products have been developed and managed over geographically distributed sites. One can learn a lot by knowing how various open source projects are developed. I have been part of and managed many projects where the groups were geographically and culturally distributed. With globalization on everyone's mind, I don't think there is any need for specific examples.
Related solutions
Check out the refactored solutions to other antipatterns like Project mismanagement.
Project Management | 2 Comments »