Search

Data Access Object with Lucene

Posted by shrikant.joshi on April 29th, 2006

Data Access Object (DAO) is a popular J2EE pattern that is used for accessing and manipulating data in a database independent way. The DAO encapsulates the persistence mechanism and the application component can use the DAO without bothering about the specifics of the persistence implementation. In addition to this, the application can use the abstract factory pattern to gain flexibility. This is achieved by introducing a DAOFactory that supplies DAOs for a given persistence mechanism. The DAOFactory in turn uses a factory method so that a concrete DAOFactory can be supplied.

DAOs can also be useful when the application uses a text store and search tool like Lucene. Lucene is a popular text indexing and searching library.

I am using TextDAO to handle the CRUD methods using Lucene. A map is passed with name/value pairs as data from the application. One of them points to a document that contains text to be searched. All of the names (fields) are used in the search using MultiFieldQueryParser. A ValueList is returned on search. Since, there is no ‘update’ method in Lucene, I implemented it using a delete followed by create. You can download the code with eclipse project.

The key to the solution is that the app does not know if we use text search library like Lucene or a database. It is a matter of simply changing the DAO if we intend to store the data in a relational database.

Conventional wisdom and MVC

Posted by shrikant.joshi on March 27th, 2006

Model-View-Controller or MVC is probably the most talked-about pattern, but it is also the least understood pattern. Number of UI frameworks address and enforce clear separation between these components. But, when it comes to practice (and deadline pressures), we see a lot of implementations not using it correctly. Of course, it is a different matter that, the proper usage of the MVC pattern ultimately (and even in short term) reduces the implementation time along with making the system flexible, extensible and more maintainable. I am not going to talk much about how the MVC pattern works but here is a short description for that.
A controller handles gestures coming from view and calls the appropriate actions on the model. The model represents business objects and processes. The view is responsible for rendering the model and also handling the change events originating from the model.
In a well designed application, it is possible to replace any of the component without impacting other components. The clear division of responsibility is the key here. It is the modularity that provides robustness and interchangeability much like the hardware components.
RoR and ZendFramework have been creating a buzz lately with a claim to simplify the software development while still enforcing a modular design. I decided to write my first PHP program using ZendFramework. I looked at the samples and realized that these have fallen thro' the cracks. They do not follow MVC even though the framework that they aim to demonstrate emphasized the pattern. Didn't I tell you?
Well, I decided to take one of the REST client examples and refactor it to use MVC. It turned out to be a pretty simple exercise. That is exactly what led me to this blog entry.
ZendFramework depends on conventions to decide the controller and action. This makes it very easy to use. A bunch of frameworks especially in C++ or Java world, do not use this technique of using conventions to simplify the development. A simple naming convention does wonders here.
Refactored Yahoo search sample
You can check out the working of this sample here.
We first create a dispatcher which sets up the view and controller directory and then dispatches the actions. Each request is broken down into controller and action. e.g. /yahoo/search is sent to YahooController to execute searchAction method. Here is the code for dispatcher or front controller:

PHP:
  1. set_include_path(get_include_path().PATH_SEPARATOR.':/path/to/zend/library');
  2.  
  3. include 'Zend.php';
  4.  
  5. function __autoload($class)
  6. {
  7. Zend::loadClass($class);
  8. }
  9.  
  10. $view = new Zend_View;
  11. $view->setScriptPath('views');
  12. Zend::register('view', $view);
  13.  
  14. $controller = Zend_Controller_Front::getInstance();
  15. $controller->setControllerDirectory('controllers');
  16. $controller->dispatch();
  17. ?>

I created a YahooController to handle all actions related to webservices provide by Yahoo. An IndexController handles all the requests that do not have a controller. The YahooController first creates the model. This can be done with a delegate for a more realistic case. Then, the model is made available to the view and then simply ask the selected view to render itself. I created a simple theme which decides on which view to render. Here is the code for YahooController:

PHP:
  1. Zend::loadClass('Zend_Controller_Action');
  2.  
  3. class YahooController extends Zend_Controller_Action
  4. {
  5. public function indexAction()
  6. {
  7. $this->_redirect('/yahoo/search');
  8. }
  9. public function searchAction()
  10. {
  11. $filterReq = new Zend_InputFilter($_REQUEST);
  12. $keywords = $filterReq->noTags('keywords');
  13. $theme = $filterReq->noTags('theme');
  14. if (empty($theme)) {
  15. $theme = 'boy';
  16. }
  17.  
  18. $yahoo = new Zend_Service_Yahoo('zendtesting');
  19.  
  20. try {
  21. $view = Zend::registry('view');
  22. $view->title = 'Yahoo Search';
  23. $view->keywords = $keywords;
  24.  
  25. if (!empty($keywords)) {
  26. $imageResults = $yahoo->imageSearch($keywords, array("results" => 5));
  27. $webResults = $yahoo->webSearch($keywords);
  28. $newsResults = $yahoo->newsSearch($keywords);
  29.  
  30. $view->imageResults = $imageResults;
  31. $view->webResults = $webResults;
  32. $view->newsResults = $newsResults;
  33. $view->searchPerformed = TRUE;
  34. } else {
  35. $view->searchPerformed = FALSE;
  36. }
  37.  
  38. echo $view->render($theme.'/view_yahoo_search.php');
  39. } catch (Zend_Service_Exception $e) {
  40. <p style="color: red; font-weight: bold">An error occured, please try again later.</p>
  41. ';
  42. echo $e;
  43. }
  44. }
  45.  
  46. public function noRouteAction()
  47. {
  48. $this->_redirect('/');
  49. }
  50. }?>

The view is responsible for displaying the model. Each theme will show the content in different way. Here is the code for a view in one theme:

HTML:
  1. <!DOCTYPE html public "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
  2. <style type="text/css">
  3. html, body {
  4. margin: 0px;
  5. padding: 0px;
  6. font-family: Tahoma, Verdana, sans-serif;
  7. font-size: 10px;
  8. }
  9.  
  10. h1 {
  11. margin-top: 0px;
  12. background-color: darkblue;
  13. color: white;
  14. font-size: 16px;
  15. }
  16.  
  17. form {
  18. text-align: center;
  19. }
  20.  
  21. label {
  22. font-weight: bold;
  23. color: blue;
  24. }
  25.  
  26. img {
  27. border: 0px;
  28. padding: 5px;
  29. }
  30.  
  31. #web, #news {
  32. float: left;
  33. width: 48%;
  34. margin-left: 10px;
  35. }
  36.  
  37. #image {
  38. margin: 10px;
  39. border: 1px dashed grey;
  40. background-color: lightgrey;
  41. text-align: center;
  42. }
  43.  
  44. h2 {
  45. font-size: 14px;
  46. color: grey;
  47. }
  48.  
  49. h3 {
  50. font-size: 12px;
  51. }
  52.  
  53. p {
  54. color: blue;
  55. }
  56.  
  57. #poweredby {
  58. clear: both;
  59. }
  60. </style>
  61. </head>
  62. <h1>Yahoo! Multi-Search</h1>
  63. <form action="/yahoo/search" method="post">
  64.  
  65. <label>Theme: <select name="theme"><option value="boy" selected>Boy</option><option value="girl">Girl</option></select></label>
  66.  
  67.  
  68. <label>Search For: <input type="text" name="keywords" value="<?php echo $this->keywords; ?>"></label>
  69. <input type="submit" value="Search!">
  70.  
  71. </form>
  72.  
  73. <?php
  74. if (($this->searchPerformed) && ($this->imageResults->totalResults()> 0)) {
  75. echo '
  76. <div id="image">';
  77. echo '
  78. <h2>Image Search Results</h2>
  79. ';
  80. foreach ($this->imageResults as $result) {
  81. echo "<a xhref='{$result->ClickUrl}' title='$result->Title'><img xsrc='{$result->Thumbnail->Url->getUri()}'></a>";
  82. }
  83. echo '</div>
  84. ';
  85. }
  86.  
  87. if (($this->searchPerformed) && ($this->webResults->totalResults()> 0)) {
  88. echo '
  89. <div id="web">';
  90. echo '
  91. <h2>Web Search Results</h2>
  92. ';
  93. foreach ($this->webResults as $result) {
  94. echo "
  95. <h3><a xhref='{$result->ClickUrl}'>{$result->Title}</a></h3>
  96. ";
  97. echo "{$result->Summary}
  98. [<a xhref='{$result->CacheUrl}'>Cached Version</a>]
  99.  
  100. ";
  101. }
  102. echo '</div>
  103. ';
  104. }
  105.  
  106. if (($this->searchPerformed) && ($this->newsResults->totalResults()> 0)) {
  107. echo '
  108. <div id="news">';
  109. echo '
  110. <h2>News Search Results</h2>
  111. ';
  112. foreach ($this->newsResults as $result) {
  113. echo "
  114. <h3><a xhref='{$result->ClickUrl}'>{$result->Title}</a></h3>
  115. ";
  116. echo "{$result->Summary}
  117.  
  118. ";
  119. }
  120. echo '</div>
  121. ';
  122. }
  123. ?>
  124. <p id="poweredby" style="text-align: center; font-size: 9px;">Powered by the Zend Framework</p>
  125.  
  126. </body>
  127. </html>

The REST of the story…

Posted by shrikant.joshi on March 12th, 2006

As KGO's Paul Harvey would say, you know what the news is. Here is the rest of the story...As it happens, the rest of the story involves REST. This rest will not put you to sleep as it is a bit of fun work. Okay, enough of puns...

I was looking into Axis as a means for easing the pain of using WebServices in Java. And, it is a nicely designed open source project based on SOAP that can be used to write servers or clients which implement webservices. I will write about this in a couple of weeks. But, for now it served as a preamble to this weeks blog.

REST stands for Representational State Transfer. It is an architectural style that specifies a way to access network resources in a stateless manner. The returned resource can be used for further drilling down and moving the client from one state to another. REST does not indulge into implementation but specifies the usage of standards like HTTP, XML.
A lot of resources/applications are made available using REST webservices on the internet. The simplicity of a REST system makes it very easy to use for writing applications in any language (including scripting languages). Yahoo, Amazon etc. are examples of REST API to access their resources. I will provide a bit more details on Yahoo's web search api as a case study here.

You can search the web for any given term, phrase, image, news, local-search using Yahoo's REST api. The search parameters are passed as a simle GET or PUT query to Yahoo and the results are returned in XML format. Once we parse the results from XML, we are ready to display the required attributes to the user. Here is a class diagram and interaction diagram that I put together to depict the flow.

Class Diagram :

Class Diagram YahooSearchClient

Interaction diagram :

Interaction Diagram
Yahoo maps, Flickr, del.icio.us are all accessible using the same fundamental concept of REST api. All one needs to do is get a (free) developer application id from Yahoo and use that in the api for accessing the information and then creating interesting applications. The api is available in many different languages.

Zend published their framework 0.1.2 for PHP. It includes an api to access some of these popular web services using REST client component. The framework looks a bit under-cooked at this time. I am sure it will get better in a short period of time. I will like to check it out soon.

AntiPattern: Distributed Disaster

Posted by shrikant.joshi on February 25th, 2006

Name: Distributed development disaster
Also known as: Geographical Gotcha
Most frequent scale: Enterprise
Refactored solution name: Game plan or Dahi-handi
Refactored solution type: Process, Management
Root causes: Development infrastructure, Time zone, Cultural difference, Unclear responsibilities
Unbalanced forces: Management of complexity,Management of IT resources
Anecdotal evidence: “Why on earth are they rewriting the same algorithm?”, “If we add a few more people to our group, we will be able to pull this through”, “I have no idea what Srini’s group is doing?”, “Damn, I have to wait till tomorrow to sync up the repository”, “cvs update is taking forever…I better go home and come back tomorrow.”, “How come nobody is in the office today?”, "What is Diwali?".



Background

Geographically distributed development is a reality in medium to large enterprises. With the advancements in communication technology and bandwidth, a large number of companies are dealing with multisite groups. The biggest problem in this is often people and process related rather than technology. A lot of these problems are also observed in collocated teams with silos, but they take a front seat in distributed teams.

Imagine a soccer team without a game plan and without assigned responsibilities where everybody is playing offence and defense at the same time! When the whole team is running after the ball, it is bound to lose regardless of talent on the team. On the other hand, a team with a well thought game plan optimizes the skills of every player on the team. In this case, the team potential is more than the sum of individual player’s talents. The coach/manager should assign each player to the position that best suits him and the team. Each player should know what his responsibilities are and should be able to communicate well with teammates.

Dahi-handi is a fun festive sport where a team of people forms a high tower by balancing on each other’s shoulders to reach a large pot full of yogurt. If people are not synchronized, the human tower falls to the ground in no time. On the other hand a well-rehearsed and coordinated teamwork can reach towering height to the pot and is a delight to watch. Well, I can’t imagine how a distributed dahi-handi will look like, may be using a PS2 with networked players!

General Form

The challenge in handling a multi-site group is multi-fold. First of all, the architecture and design of the system should be modular and partitioned properly. If the changes that one of the subgroups is doing affect other modules, there needs to be a closer interaction. If the interfaces are not clearly defined, the results lead to chaos and non-productive environment. Writing a comprehensive and automated suite of unit tests is invaluable. Complete and precise requirements document, design documents and review process at each stage is extremely important. An important observation to remember is that the dev. Process should foster independent thinking and not sweat the small stuff. Controlling at micro level often does not work.

Source code management is another important factor which affects this scenario. Multiple solutions have been employed to tackle this front in the form of mirroring, synchronizing, master-slave conf., module separation etc. Issues like tuning synchronization frequency and master-slave need to be handled depending on the coupling between multi-site groups. A fine tuned build process for distributed teams should reduce the amount of downloads (typically of common libraries) during each build.

The biggest problems have been reported on ‘Personal’ fronts rather than on technology front. The teams located at different places need to feel like they are part of the same group. Water cooler gossip needs to flow back and forth. Tele-conferencing, e-mails and periodic phone conversations help but nothing builds the bond like ‘face time’. Co-workers should visit on a reasonable basis to get acquainted with other team members. These visits should not be taken to the extremes so as to become a burden.

Symptoms and consequences

  • No or bad communication between groups working at different sites
  • No reuse at design or code level
  • Incompatibility between modules
  • Extensions in the system lead to multiple changes across all groups
  • No clearly defined interfaces
  • Build breaks which get fixed after a very long time
  • Lost changes in source code repository
  • Merge problems due to ineffective sync. cycles
  • Poor knowledge of modules that are owned by other sites
  • Slow learning curve during new employee training
  • Unacceptable network bandwidth
  • Blame game
  • Slow turnaround for customer issues

Refactored solution

There are at least following dimensions to the solution:

  1. Architecture and design of the product: A truly modular and extensible architecture usually leads to clear separation of modules and responsibilities. Proper use of encapsulations, interfaces, facades, inheritance hierarchies and usage of design patterns in general are of immense value.
  2. Development process: Architectural responsibilities should be assigned so decisions can be made locally. The process should also promote design and code reviews. Coding standards make life easier not only for developers but code maintainers. We should figure out what works for our situation so that there is no unnecessary overhead of red-tape processes while the product is managed with control and direction. Ownership per deliverable is another way to handle this problem.
  3. Automated test suite: The value of an automated unit test suite increases multifold when a team is distributed. The team needs to make sure that all the tests pass at any time from a public build. Mock objects based testing helps the development process when parallel teams are working at different velocity.
  4. Communication between diverse groups: Mailing lists, informal dialogues, group decision making, unity of purpose, absorbing and enjoying cultural diversity, on-site visits, briefings, design documents, teleconferencing are various tools for success in this area. An important point is to minimize unnecessary communication while maximizing understanding. This is similar to partitioning an application across network processors.
  5. Proper use of code management system: There are a lot of different approaches employed by various organizations. We need to find out what works best for us. Mirroring and master-slave configurations are widely used. There are positive and negative sides for these approaches. Various source code control systems support distributed development in different ways. ClearCase has an add-on unit called ‘Multi-site’. CVS, Perforce have also been employed by a lot of companies. The important thing for us is to keep it simple and workable.
  6. Time overlap: It is very desirable to have at least a few hours of overlap time for the work hours. In cases where the teams are separated by brutal time differences, like California and India, we need to come up with creative ways for an overlap. E.g team in California can work 6:30 am to 3:30 pm while India team can work from noon to 9pm. That way an overlap of at least an hour or two is availed. Weekly mmetings can clear a lot of doubts and bring the team on the same page.
  7. Hierarchy: In general, flatter teams work best. But, in certain situations, module leaders can contribute and ease the pain of day to day management. I will try and address this issue some other time.

Example

A lot of products have been developed and managed over geographically distributed sites. One can learn a lot by knowing how various open source projects are developed. I have been part of and managed many projects where the groups were geographically and culturally distributed. With globalization on everyone's mind, I don't think there is any need for specific examples.

Related solutions

Check out the refactored solutions to other antipatterns like Project mismanagement.

Read-Write-Execute

Posted by shrikant.joshi on February 12th, 2006

I created the site for no other reason other than to have fun creating it. I intend to write my thoughts about various subjects that interest me. I hope it will be useful or at least interesting.

I have been reading a lot of blogs and other material on the net. I thought this is the easiest way to change my file permissions from read-only to read-write! I hope I can execute the plan and keep up the enthusiasm.