A REST API in Clojure

Clojure is one of the most interesting new languages targeting the JVM. Initially only the JVM, in the meantime it is also available for JavaScript. Essentially, you can write Clojure and either execute it as Java program or JavaScript program, of course each flavor has its unique features as well.

Clojure is a Lisp, thus the syntax may be foreign, but it is really, really easy since there are very few syntactic variations. The language “Lisp” is very lean and usually easily learned.

In this post, we’re going to create a complete REST application from scratch. There are already some (very) good tutorials available, but some are not quite up to date (see Heroku’s Devcenter or Mark McGranaghan for good ones). Clojure itself is still a young language, Lisp of course has a lot of history.

Our application should allow creating, listing, fetching, updating, and deleting of documents.

A document looks like this (JSON encoded):

    {
      "id" : "some id"
      , "title" : "some title"
      , "text" : "some text"
    }
  • A GET call to /documents should return a list of these documents.
  • A POST call to /documents with a documents as body shall create a new document, assigning a new id (ignoreing the posted one).
  • A GET to /documents/[ID] should return the document with the given id, or 404 if the document does not exist.
  • A PUT to /documents/[ID] should update the document with the given id and replace title and text with those from the document in the uploaded body.
  • A DELETE to /documents/[ID] should delete the document with the given id and return 204 (NO CONTENT) in any case.

Creating the project scaffolding

We’re going to use Leiningen, the defacto build system and dependency manager for Clojure projects. Download and install it, then execute:

    lein new compojure clojure-rest

We’re creating a new Compojure project called clojure-rest. Compojure is the library that maps URLs to functions in our program. Compojure (and our project) builds on Ring is the basic Server API. To start the new project run:

    lein ring server

This starts the server on localhost:3000 and automatically restarts the server if any of the project files change. Thus, you can leave it running while we develop our application.

The new command generates two very important files for us:

project.clj is the project configuration. It states dependencies, the entry point etc. (read the whole documentation on Leiningen.org, and src/clojure_rest/handler.clj which contains a starting point for our application.

Project configuration (project.clj)

    (defproject clojure-rest "0.1.0-SNAPSHOT"
      :description "FIXME: write description"
      :url "http://example.com/FIXME"
      :dependencies [[org.clojure/clojure "1.4.0"]
                     [compojure "1.1.1"]]
      :plugins [[lein-ring "0.7.3"]]
      :ring {:handler clojure-rest.handler/app}
      :profiles
      {:dev {:dependencies [[ring-mock "0.1.3"]]}})

Update the file to look like this:

    (defproject clojure-rest "0.1.0-SNAPSHOT"
      :description "REST service for documents"
      :url "http://blog.interlinked.org"
      :dependencies [[org.clojure/clojure "1.4.0"]
                     [compojure "1.1.1"]
                     [ring/ring-json "0.1.2"]
                     [c3p0/c3p0 "0.9.1.2"]
                     [org.clojure/java.jdbc "0.2.3"]
                     [com.h2database/h2 "1.3.168"]
                     [cheshire "4.0.3"]]
      :plugins [[lein-ring "0.7.3"]]
      :ring {:handler clojure-rest.handler/app}
      :profiles
      {:dev {:dependencies [[ring-mock "0.1.3"]]}})

Besides the JSON parsing library Cheshire, we added the C3P0 Connection Pool, the H2 Database JDBC driver and Clojure’s java.jdbc contrib-library.

I also updated the :url and :description fields.

The request handler (handler.clj)

Next let’s have a look at the generated request handler src/clojure_rest/handler.clj:

    (ns clojure-rest.handler
      (:use compojure.core)
      (:require [compojure.handler :as handler]
                [compojure.route :as route]))

    (defroutes app-routes
      (GET "/" [] "Hello World")
      (route/not-found "Not Found"))

    (def app
      (handler/site app-routes))

The route GET "/" [] "Hello World" is responsible for our result we saw in the browser. It maps all GET requests to / without parameters to "Hello World". The (def app (handler/site app-routes)) part configures our application (registering the routes).

Our first step is to update the configuration. We’re going to work with JSON, so let’s include some Ring middlewares to setup response headers (wrap-json-response) and parse request bodies (wrap-json-body) for us. A middleware is just a wrapper around a handler, thus it can pre- and post-process the whole request/response cycle.

    (def app
      (-> (handler/api app-routes)
        (middleware/wrap-json-body)
        (middleware/wrap-json-response)))

We switched also from the handler/site template to handler/api which is more appropriate for REST APIs (documentation).

Next let’s define the routes for our application:

    (defroutes app-routes
      (context "/documents" [] (defroutes documents-routes
        (GET  "/" [] (get-all-documents))
        (POST "/" {body :body} (create-new-document body))
        (context "/:id" [id] (defroutes document-routes
          (GET    "/" [] (get-document id))
          (PUT    "/" {body :body} (update-document id body))
          (DELETE "/" [] (delete-document id))))))
      (route/not-found "Not Found"))

We define GET and POST for the context "/documents, and GET, PUT, DELETE for the context ":id" on top of that. :id is a placeholder and can then be injected into our parameter vector. The POST and PUT request have a special parameter body for the parsed body (this parameter is provided by the wrap-json-body middleware. For more on routes, take a look at Compojure’s documentation.

Before we define the functions to carry out the requests, let’s fix the imports and open a pool of database connections to work with.

The namespace declaration is used to define which namespaces shall be made available by Clojure.

    (ns clojure-rest.handler
      (:import com.mchange.v2.c3p0.ComboPooledDataSource)
      (:use compojure.core)
      (:use cheshire.core)
      (:use ring.util.response)
      (:require [compojure.handler :as handler]
                [ring.middleware.json :as middleware]
                [clojure.java.jdbc :as sql]
                [compojure.route :as route]))

We import C3P0’s ComboPooledDataSource, a plain Java class. Next, we fetch the functions defined in compojure.core, cheshire.core, and ring.util.response into our namespace, they can be used without qualifying. Finally we require some more libraries, this time with a qualifier to prevent name clashes or to support nicer separation. I’m not sure when to make the cut between :use and :require yet, so the cut is abitrary.

    (def db-config
      {:classname "org.h2.Driver"
       :subprotocol "h2"
       :subname "mem:documents"
       :user ""
       :password ""})

Note, we use a in-memory database. If you’d like to keep your database between restarts, you could use :subname "/tmp/documents" for example.

Next we open a pool of connections. C3P0 has no Clojure wrapper, so we deal with Java classes and objects directly (hence a bit more code).

    (defn pool
      [config]
      (let [cpds (doto (ComboPooledDataSource.)
                   (.setDriverClass (:classname config))
                   (.setJdbcUrl (str "jdbc:" (:subprotocol config) ":" (:subname config)))
                   (.setUser (:user config))
                   (.setPassword (:password config))
                   (.setMaxPoolSize 6)
                   (.setMinPoolSize 1)
                   (.setInitialPoolSize 1))]
        {:datasource cpds}))

    (def pooled-db (delay (pool db-config)))

    (defn db-connection [] @pooled-db)

Since we deal with a in-memory database, we need to create our table now.

    (sql/with-connection (db-connection)
      (sql/create-table :documents [:id "varchar(256)" "primary key"]
                                   [:title "varchar(1024)"]
                                   [:text :varchar]))

The intent should be easy to understand, for the details take a look at the java.jdbc documentation. We create a table documents with a :id, :title, and :text column. Note that the database column is called id, not :id.

The only thing missing are the functions to actually perform the actions requested by our clients.

To return a single document with a given id, we could come up with this:

    (defn get-document [id]
      (sql/with-connection (db-connection)
        (sql/with-query-results results
          ["select * from documents where id = ?" id]
          (cond
            (empty? results) {:status 404}
            :else (response (first results))))))

It reads like this: when called with an id, open a database connection, perform select * from documents where id = ? with the given id as parameter. If the result is empty, return 404, otherwise return the first (and only) document as response.

The response call will convert the document into JSON, this functionality is provided by wrap-json-response, which also sets the correct Content-Type etc.

Another nice one is the creation of new documents:

    (defn uuid [] (str (java.util.UUID/randomUUID)))

    (defn create-new-document [doc]
      (let [id (uuid)]
        (sql/with-connection (db-connection)
          (let [document (assoc doc "id" id)]
            (sql/insert-record :documents document)))
        (get-document id)))

Here we use Java’s UUID generator (without import, hence the full package name) to generate a new id for each document created. The second let statement is responsible to replace the user-provided id (if any) with our generated one. Remember that Clojure’s datastructures are immutable, so we need to use the document variable thereafter, instead of the doc which still contains the old (or no) id.

Returning the document is delegated to the get-document function.

The complete handler.clj

To round the post up, here is the whole program:

    (ns clojure-rest.handler
      (:import com.mchange.v2.c3p0.ComboPooledDataSource)
      (:use compojure.core)
      (:use cheshire.core)
      (:use ring.util.response)
      (:require [compojure.handler :as handler]
                [ring.middleware.json :as middleware]
                [clojure.java.jdbc :as sql]
                [compojure.route :as route]))

    (def db-config
      {:classname "org.h2.Driver"
       :subprotocol "h2"
       :subname "mem:documents"
       :user ""
       :password ""})

    (defn pool
      [config]
      (let [cpds (doto (ComboPooledDataSource.)
                   (.setDriverClass (:classname config))
                   (.setJdbcUrl (str "jdbc:" (:subprotocol config) ":" (:subname config)))
                   (.setUser (:user config))
                   (.setPassword (:password config))
                   (.setMaxPoolSize 1)
                   (.setMinPoolSize 1)
                   (.setInitialPoolSize 1))]
        {:datasource cpds}))

    (def pooled-db (delay (pool db-config)))

    (defn db-connection [] @pooled-db)

    (sql/with-connection (db-connection)
    ;  (sql/drop-table :documents) ; no need to do that for in-memory databases
      (sql/create-table :documents [:id "varchar(256)" "primary key"]
                                   [:title "varchar(1024)"]
                                   [:text :varchar]))

    (defn uuid [] (str (java.util.UUID/randomUUID)))

    (defn get-all-documents []
      (response
        (sql/with-connection (db-connection)
          (sql/with-query-results results
            ["select * from documents"]
            (into [] results)))))

    (defn get-document [id]
      (sql/with-connection (db-connection)
        (sql/with-query-results results
          ["select * from documents where id = ?" id]
          (cond
            (empty? results) {:status 404}
            :else (response (first results))))))

    (defn create-new-document [doc]
      (let [id (uuid)]
        (sql/with-connection (db-connection)
          (let [document (assoc doc "id" id)]
            (sql/insert-record :documents document)))
        (get-document id)))

    (defn update-document [id doc]
        (sql/with-connection (db-connection)
          (let [document (assoc doc "id" id)]
            (sql/update-values :documents ["id=?" id] document)))
        (get-document id))

    (defn delete-document [id]
      (sql/with-connection (db-connection)
        (sql/delete-rows :documents ["id=?" id]))
      {:status 204})

    (defroutes app-routes
      (context "/documents" [] (defroutes documents-routes
        (GET  "/" [] (get-all-documents))
        (POST "/" {body :body} (create-new-document body))
        (context "/:id" [id] (defroutes document-routes
          (GET    "/" [] (get-document id))
          (PUT    "/" {body :body} (update-document id body))
          (DELETE "/" [] (delete-document id))))))
      (route/not-found "Not Found"))

    (def app
        (-> (handler/api app-routes)
            (middleware/wrap-json-body)
            (middleware/wrap-json-response)))

Yeah, the whole program with connection pooling, JSON de/encoding in roughly 90 lines of (admittedly dense) code.

To sum it up: Clojure is fun, concise, and very powerful. Together with the excellent Java integration it ranks very high on my “languages I adore” list.

Programming is not about typing, it’s about thinking.
— Rich Hickey

REST Framework Survey (Java, Haskell, Go, Node.js)

Over the last few days I’ve experimented with various REST frameworks. The initial goal was to find The Language and framework to use for all future projects… . Of course there is no clear winner.

These were the contestants:

The client was written in Go because I wanted it to be fast and easy to make concurrent requests to try Go.

The server spec is to povide a way to insert, query, update, list and delete documents.

  • A POST to /documents should create a new document, generate a new UUID (v4) as ID and return the whole document.
  • A GET to /documents should return a list of all documents
  • A GET to /documents/[ID] should return the document with the given ID, or 404 if it is not found.
  • A PUT to /documents/[ID] should update the document with the given ID and return it.
  • A DELETE to /documents/[ID] should delete the document with the given ID.

The documents are encoded in JSON:

{
    "id": "some id",
    "title": "some title",
    "text": "some text"
}

The client may or may not send the id field, the server ignores it and either generates a new one, or uses the one from the URL.

The client’s steps:

  1. Insert a document
  2. Update that document
  3. Delete that document

I used goroutines to make that concurrent and checked the consistency via a final GET /documents asserting an empty list.

Step 1

As database backend, I used an SQLite3 in-memory database connected via:

The implementation was easy with Java and Node.js. I’ve had my fair share of trouble with Haskell because there is no clear REST framework to use, at least I couldn’t find the obvious choice and went with Scotty because it’s simple and did what I needed, and my Haskell-fu has seen better days.

Here’s a inconclusive performance report1 (how long it took, in seconds, so longer is worse) processing 10000 documents with the same client on each of the backends.

Node.js is a bit slower, Java and Haskell are pretty much on par.

Code

Here’s the code used to update (PUT) the new version of a document on the server. I’ve chosen the update case because it shows how to deal with path-parameters, as well as how to decode and encode JSON.

Node.js Code to wire PUT requests to the DB.

server.put('/documents/:docId', update); // define which function to call for that URL+method
...
function update(req, res, next) {
  var doc = req.body; // directly available as object through the BodyParser plugin
  doc.id = req.params.docId; // taken from the URL
  db.run("update documents set title = ?, text = ? where id = ?", doc.title, doc.text, doc.id);
  return read(req, res, next); // read the document or return 404
}

The Jersey annotations used to declare “updateDocument” as handler for PUT requests. JSON en/decoding is fully transparent (because of the @Produces annotation).

@PUT // handle PUT requests
@Path("/documents/{id}") // on that URL
@Consumes(MediaType.APPLICATION_JSON) // accepts only JSON
@Produces(MediaType.APPLICATION_JSON) // writes out JSON
// params can be references and injected, the body is automaticall decoded into the correct object
public Document updateDocument(@PathParam("id") String id, Document document) throws SQLException {
    // perform the database stuff (in a data access object as it is customary in Java)
    Document document = documentDao.update(id, document);
    if (document == null) {
        throw new NotFoundException(); // produces the 404 status
    }
    // the returned object is also automatically encoded
    // and all headers are set correctly
    return document;
}

The Haskell version is very concise, but with the various Monad-layerings a bit opaque (especially the DB code):

put "/documents/:id" $ do -- the URL this function is defined for
  id <- param "id" -- extract the parameter from the URL
  inputDocument <- jsonData -- parse the JSON body
  doc <- liftIO $ updateDocument conn id inputDocument -- write the stuff into the DB
  resultOr404 doc -- return the document or 404

-- for reference on how to deal with Maybe
resultOr404 :: Maybe Document -> ActionM ()
resultOr404 Nothing  = status status404 -- return 404 without a body
resultOr404 (Just a) = json a -- return JSON (also setting the content type)

Actually, the only server checking for the existance of the document is the Haskell variant. Simply because the type system enforces it!

Finally, here is the corresponding Go client code:

func updateDocument(id string) {
    doc := Document{id, "New Title", "New Text"} // set the new content
    jsonReader := getJsonReader(doc) // encode into JSON
    req, _ := http.NewRequest("PUT", base + "/" + id, jsonReader) // prepare the call
    req.Header.Add("Content-Type", jsonType) // set the correct content-type
    res, _ := client.Do(req) // execute the call
    if res != nil { // check for a response
        res.Body.Close() // close the response stream
    }
    // yeah, I ignore errors...
}

Step 2

After trying out the services with the in-memory database, I got curious and wanted to see how they’d perform using a PostgreSQL Database. So I switched the database layer to these:

The switch was pretty easy for Java and Haskell (a matter of exchanging the database driver and connect-string). For Node.js I had to rewrite more or less the whole app since there seems to be no standard interface for DB access.

The client does not need to change.

The performance of the REST services with Postgres instead of SQLite are shown below. This time they show the time needed to process 1000 documents. I wasn’t patient enough to wait for 10000 documents to finish – I took the average of three runs.

This time, Node blew Java away, Haskell was also significantly slower than Node. My guess is, that the JDBC and HDBC abstractions take their fair share of overhead, but Java’s extreme case might have another cause (I haven’t investigated here).

Conclusion

Writing v1

In general, writing the server in Java was quite easy (I’m used to that), Node.js was easy too (chaning was hard), and Haskell took me more time than the other two servers and the Go client together. I’d love to add Clojure and a Go server to the mix and see how they perform.

All three contestants work well and are reasonably fast. I never noticed Java’s problem with Postgres as shown here, but I rarely use JDBC directly these days.

Update to v2

The switch from SQLite to Postgres was pretty painful in Node.js. For Java and Haskell the switch was easy and fast.

Deployment

To deploy a Java webapp, you need a servlet container (I used Jetty) and deploy it within that. The war file usually includes all necessary dependencies. Every Node.js app starts its own server on its own port, so there isn’t much to it: install Node.js, install the required libraries with npm, Profit! Haskell produces a single binary which can be copied to the destination. Starting the binary, also starts the webserver and can thus be used directly or via a proxy. There are still some dynamic linked libraries which have to be present on the target machine.

The nice thing about Java is, that all necessary dependencies are bundled with the app itself. Haskell and its type system also check whether newer versions are still compatible, for Node.js it seems that our only option is to perform adequate testing.

Summary

This sums up my endeavour: I’ve had the most fun writing the Go client, but the Haskell services has the fewest possibilities for error. Haskell is really, really cool for writing REST services, and performs very good. Node.js is very young and provides very good productivity and performance to get something up and running, but (to me), maintaining Node.js code seems like something I wouldn’t want to do. Java is a compromise, nobody got every fired for using Java (except, maybe someone on the Android team).

  1. What else to do with three similar REST services?

SQL, Lisp, and Haskell are the only programming languages that I’ve seen where one spends more time thinking than typing.
— Philip Greenspun

Programming Languages in Joy and Sorrow

Most of us (programmers) know, and need to know, many programming languages. Some aren’t even perceived as programming languages any more (Shell-Scripts), some make us a living (Java, C# etc.), some are hard to replace (JavaScript), and some are just fun to play with (make your choice).

What makes programming languages differ is not syntax, syntax is nothing more than a mechanical translation, much like a cipher: back and forth without gaining or losing information.

General purpose language don’t even differ in what they’re able to express. All of them are “Turing Complete”1 and hence are equally potent.

What makes a difference is how programming in these languages feel like. Even though Assembler and JavaScript are equally potent (in theory), they are two very different beasts.

What appeals to me are just a handful properties of a language:

  • simplicity,
  • expressiveness,
  • performance,
  • productivity.

Simplicity

If a programming language is complicated by itself, programs become brittle, programmers are distracted, there is no uniform structure etc. C++ is a wonderful and very expressive language, but it has so many ways of doing things, so many features to learn (and distract) that I would not care to use it again.

Simplicity was one of C’s big appeals, the language is very lean and has a small set of features. The language features were carefully chosen to be general enough to not be too restrictive.

Expressiveness

How much code do I need to write to get a job done? How good are the means of abstraction (how often do I need to repeat myself)?

Assembler doesn’t have much abstraction capabilities (mostly jumps), whereas functional languages offer powerful ways of separating, combining, and reusing code blocks.

A simple and expressive language is easy to understand, and has few, but powerful, means for abstraction.

Performance

Computers used to get faster every year for some time, currently we’ve reached a plateau and instead of scaling up, we’re scaling out by adding cores and machines.

Our programs don’t get faster by waiting for a better machine anymore. We need to actively take additional cores into consideration. The future (and probably the present) is distributed!

I think performance is still a major merit of today’s software. If your service can’t scale to “internet scale”2, you’ll lose. If your competitor offers the same set of features, but twice as fast, you’ll lose.

So a modern programming does not need to be the fastest one on a single machine and a single core, but it should be reasonable fast and scale easily to threads and processes.

Productivity

Computers have become insanely fast, but programmer productivity stayed the same. Sure, we can’t call Intel for a brain-upgrade, but we can choose and provide the right set of abstractions to support us.

Besides the simplicity and expressiveness of a language, productivity involves the available libraries, the community and the culture of the community. Java, for example, has a very active Open Source culture, quite untypical for a business related language.

“Academic” (or, let’s call them “non-mainstream”) languages are often very beautiful and expressive, but their lack of practical libraries makes it hard to get up and running quickly. We’re in an age of quick-fixes and easy gains – what’s the point of choosing the newest language du jour if we can’t deliver faster (in the long run)? What’s the point of “engineering” if, in the end, we need even more time than by hacking3?

Productivity, for me, heavily affects the fun in programming and mostly subsumes simplicity and expressiveness.

Language assessment

In a business setting you rarely have a free choice of weapons, but for personal or pet projects, you can do that. Most people I know stick to the language from work because it is well known. Some are increasingly unsatisfied with their day-job language and set out for the search of their “own” language. I can’t tell you what the best language is. Everyone is different, like Yukihiro Matsumoto said here:

No language can be perfect for everyone. I tried to make Ruby perfect for me, but maybe it’s not perfect for you. The perfect language for Guido van Rossum is probably Python.

You are different from everyone else. Embrace the difference. Use your brain and make you own choice.

Of course, given the sheer amount of programming languages, making an educated guess is crucial. Taking only the 100 most popular languages into account is surely not too far off.

Go Philosophical

Languages are divided into a few categories concerning programming paradigm (object oriented, functional, procedural…), typing (strong, weak, static, dynamic) etc.

My personal preference is with statically typed functional languages because they provide very good abstractions and safety through the compiler.

Look for a language with good interop functionality.

We’ve invested a lot of time in building libraries, helpers, utilities etc. Of course starting from scratch is fun, but it’s rarely economical.

If you used to program in Java, start looking for JVM languages for example.

Read some code before diving in

Typing.io is a nice starting point for some languages, and of course, there is always github.

Do you feel comfortable with the layout, the language, does it feel natural? Do you even understand some of it?

Enough with the subjective assessment, show me something objective!

Here is an unbiased and objective survey of a few programming languages I’m interested in. It’s totally foolproof and you certainly should start a multimillion dollar company on it!

This chart shows the ratio of the number of search results for

"c programming" "sucks"

and

"c programming" "rocks"

as returned by Yahoo!.

That is, "c programming" "sucks" returns 20800 results, "c programming" "rocks" returns 222000 results, hence a ratio of 10.6731 in the chart. Thus, everything below 1 means, there are more “sucks” results than “rocks” results.

Since Haskell, Clojure, and Scala completely dominate the chart, here a version with the obvious winners removed:

The numbers were computed using Google Docs - Spreadsheets and its awesome ImportXML function:

=ImportXML("http://search.yahoo.com/search?p="&C4,"//span[@id='resultCount']")

Cell C4, referenced in the above code, contained the URL encoded search-string, for example %22c programming%22 %22rocks%22. Try it with prolog ;-).

  1. Even some configuration files are considered Turing Complete, like sendmail’s.

  2. Whatever internet scale is in your domain.

  3. You’ve got to measure the whole lifetime of a project, it doesn’t help to get something working very quickly but with huge amounts of bugs, hard to change etc. I guess that’s a topic for a whole book on its own.

Measuring programming progress by lines of code is like measuring aircraft building progress by weight.
— Bill Gates

In the beginning was the Test...

JUnit offers many features besides the standard assertTrue/assertEquals methods most programmers use. Let’s browse through the newer and more exotic features. They might come in handy at some time.

JUnit 4.9 Feature Roundup

Assuming you know something about unit testing and JUnit in particular, I won’t start at the very bottom, but talk a little about the features introduces during the last few versions:

  • Matchers
  • Assumptions
  • Categories
  • Theories
  • Rules

I hope there is something new in here for you. JUnits javadoc documentation is very good, but there is no single place describing these features. It’s not my goal to give a thorough treatment of them here, but it might be a good starting point.

Matchers

By including Hamcrest (core) into the default JUnit distribution, JUnit now allows the usage of assertThat leading to much easier to read tests and better error messages:

@Test
public void testUsingAssertThat() {
  assertThat(42, is(greaterThan(43))); // note, this will fail
}

JUnit includes only the Hamcrest core matchers, if you want/need more matchers, include hamcrest-all 1.1. Included matchers are documented here for Hamcrest and here for JUnit additions.

Output:

java.lang.AssertionError: 
Expected: is a value greater than <43>
     got: <42>

Assumptions

Assumptions allow tests to be ignored if the assumed condition isn’t met (instead of failling).

This test will be ignored if it is run on a Windows OS (for example):

@Test
public void testUsingAssumeThat() {
  assumeThat(File.separator, is("/"));
  ...
}

It is also possible to use assumptions in @Before or @BeforeClass methods.

Output (for example):

Test 'org.interlinked.junit.assumption.BasicTest.testUsingAssumeThat' ignored
org.junit.internal.AssumptionViolatedException: got: "\", expected: is "/"

Categories

Using categories it is possible to run only a subset of the tests. For example slow tests, integration tests etc.

Here, TestCategoryA and TestCategoryB are empty interfaces used to mark the tests:

@Test
@Category(TestCategoryA.class)
public void testCatA() {
  System.out.println("Category A test");
}

@Test
@Category(TestCategoryB.class)
public void testCatB() {
  System.out.println("Category B test");
}

@Test
@Category({ TestCategoryA.class, TestCategoryB.class })
public void testCatAB() {
  System.out.println("Category A and B test");
}

Using the Categories suite, we can now execute only those tests that are in “Category A”, but not in “Category B”:

@RunWith(Categories.class)
@Categories.IncludeCategory(TestCategoryA.class) // this would run tests CatA and CatAB
@Categories.ExcludeCategory(TestCategoryB.class) // now test CatAB is excluded too
@Suite.SuiteClasses(BasicTest.class)
public class CategoryASuite { }

Output:

Category A test

Theories

With theories we can write parameterized tests. We define a few theories and some datapoints. JUnit will match the types of the datapoints and the theories.

Again, we have to use a special suite class Theories:

@RunWith(Theories.class)
public class TheoryTest {
  @DataPoint public static final String POINT1 = "POINT1";
  @DataPoint public static final String POINT2 = "POINT2";

  // mind the plural!
  // uses only the items of the array, never the whole array!
  @DataPoints public static final String[] POINTS = new String[] {"abc", "cde", "efg", "ghi"};

  @DataPoint public static final String[] POINTS_ARRAY = POINTS;

  @Theory
  public void testTheory(String param) {
      System.out.println("Got: " + param);
  }

  @Theory
  public void testTheoryWithTwoParams(String param1, String param2) {
      System.out.println("Got " + param1 + " and " + param2);
  }

  @Theory
  public void testArray(String[] array) { // gets called with POINTS_ARRAY, nothing else
      System.out.println("Got called...");
      assertThat(array.length, is(equalTo(POINTS_ARRAY.length)));
  }
}

Output:

Got POINT1 and POINT1
Got POINT1 and POINT2
Got POINT1 and abc
Got POINT1 and cde
Got POINT1 and efg
Got POINT1 and ghi
Got POINT2 and POINT1
Got POINT2 and POINT2
Got POINT2 and abc
Got POINT2 and cde
Got POINT2 and efg
Got POINT2 and ghi
Got abc and POINT1
Got abc and POINT2
Got abc and abc
Got abc and cde
Got abc and efg
Got abc and ghi
Got cde and POINT1
Got cde and POINT2
Got cde and abc
Got cde and cde
Got cde and efg
Got cde and ghi
Got efg and POINT1
Got efg and POINT2
Got efg and abc
Got efg and cde
Got efg and efg
Got efg and ghi
Got ghi and POINT1
Got ghi and POINT2
Got ghi and abc
Got ghi and cde
Got ghi and efg
Got ghi and ghi
Got called...

Rules

Finally, rules allow us to add behaviour to tests. They can be thought of some kind of AOP for JUnit. Using rules, we can often omit class hierarchies and still reuse functionality using delegation.

JUnit includes some rules to start with, but it is very easy to write our own rules.

public class RuleTest {
  @Rule public TemporaryFolder temporaryFolder = new TemporaryFolder();
  @Rule public TestName testName = new TestName();
  private static boolean fileCreated = false;
  @Rule public LoggingRule loggingRuld = new LoggingRule();

  @Before
  public void printTestName() {
    System.out.println(testName.getMethodName());
  }

  @Test
  public void testCreatingAFile() throws IOException {
    File newFile = temporaryFolder.newFile("test1");
    assertThat(newFile.isFile(), is(true));
    fileCreated = true;
  }

  @Test
  public  void testCheckIfItExists() { // depends on testCreatingAFile...
    assumeTrue(fileCreated); // just to be sure ;)
    File file = new File(temporaryFolder.getRoot().getAbsolutePath() + "/test1");
    // the file should not exist (unless we use ClassRule for the TemporaryFolder, for example)
    assertThat(file.isFile(), is(false));
  }

  @Test(expected = NullPointerException.class)
  public void testThrowException() {
    throw new NullPointerException();
  }
}

Output:

Starting: testCreatingAFile
testCreatingAFile
Finished: testCreatingAFile
Starting: testCheckIfItExists
testCheckIfItExists
Finished: testCheckIfItExists
Starting: testThrowException
testThrowException
Finished: testThrowException

The TemporaryFolder and TestName rules are included in JUnit, the LoggingRule is a simple example:

public class LoggingRule extends TestWatcher {
  @Override
  protected void starting(Description description) {
    System.out.println("Starting: " + description.getMethodName());
  }

  @Override
  protected void finished(Description description) {
    System.out.println("Finished: " + description.getMethodName());
  }
}

Other rules included (see JUnit’s javadoc):

  • ErrorCollector: collect multiple errors in one test method
  • ExpectedException: make flexible assertions about thrown exceptions
  • ExternalResource: start and stop a server, for example
  • TemporaryFolder: create fresh files, and delete after test
  • TestName: remember the test name for use during the method
  • TestWatcher: add logic at events during method execution
  • Timeout: cause test to fail after a set time
  • Verifier: fail test if object state ends up incorrect

Unfortunately, rules seem to be local to the defining class, so you can’t put the into the suite class like @Before and @BeforeClass (which is really nice for opening external resources once for all tests).

Misc additions

Infinitest

For each change you make, Infinitest runs all the dependent tests. It’s continous testing for Eclipse and IDEA - free and open source (written by inproving works)!

ClasspathSuite

Most IDEs have their own ways for finding test classes to run, but usually I like to be IDE independent. Using the ClasspathSuite it is possible to have JUnit detect all test classes (or a subset of them) within the classpath (written by Johannes Link). There are efforts to include it into the standard distribution of JUnit.

Every programmer knows they should write tests for their code. Few do. The universal response to “Why not?” is “I’m in too much of a hurry.” This quickly becomes a vicious cycle- the more pressure you feel, the fewer tests you write. The fewer tests you write, the less productive you are and the less stable your code becomes. The less productive and accurate you are, the more pressure you feel.
— Kent Beck/Erich Gamma – JUnit Test Infected

JavaScript

Few languages are so clearly worth learning as JavaScript.
– It’s an interesting language that doesn’t restrict how you use it.
– It’s a language you get paid for developing in.
– It’s still cool (who cares about Java anymore?).
– It’s here to stay (for some time, though).

So, to sum it up, I believe investing in JavaScript pays off. Even if you’re an “enterprise developer” or something. JavaScript will get you, sooner or later, so get it first!

Continue to full post...

How to dive into Legacy Code

Diving into legacy code written some time ago can be a daunting task. It doesn’t even matter much if we’ve been writing it ourselves, or somebody else, code rots faster than we’d like to admit.

Currently faced with such a task I tried to do it in a systematic and repeatable manner.

My steps

First try to find the modules and their dependencies. I used IntelliJ IDEA for my current Java project. Since it also uses Maven, finding the dependencies was easy.

Create a graph of module interdependencies. Which modules depend on which? Find the “edges” of the system (modules that do not depend on other modules). I found them to be the best starting point for a more detailed analysis.

Find out what the purpose of each module is. Is it a layer in the system (like a DAO-module)? Is it a cross-cutting concern (model classes)?

The next step is to analyse each module by itself. For this step I recommend using doxygen. It can generate a very good documentation of the software at hand, even if no doxygen (or any other type of markup) was used, by analysing the dependencies, class hierarchies, call graphs of the program. doxygen supports many languages, chances are high yours will be too.

To get the most out of doxygen, I’ve used the following configuration file which enables many of the advanced analysis features (like call graphs etc): doxygen.config.

You have to edit the file and provide - at least - the input and output directories! After that, it’s simply a doxygen doxygen.config.

To generate this kind of documentation easily from Maven, here is a similar doxygen-maven-plugin configuration:

 1     <build>
 2         <plugins>
 3             <plugin>
 4                 <groupId>com.soebes.maven.plugins.dmg</groupId>
 5                 <artifactId>doxygen-maven-plugin</artifactId>
 6                 <configuration>
 7                     <projectName>${project.artifactId}</projectName>
 8                     <projectNumber>${project.version}</projectNumber>
 9                     <optimizeOutputJava>true</optimizeOutputJava>
10                     <extractAll>true</extractAll>
11                     <extractStatic>true</extractStatic>
12                     <recursive>true</recursive>
13                     <exclude>.git</exclude>
14                     <excludePatterns>*/test/*</excludePatterns>
15                     <inlineSources>true</inlineSources>
16                     <referencedByRelation>true</referencedByRelation>
17                     <referencesRelation>true</referencesRelation>
18                     <hideUndocRelations>false</hideUndocRelations>
19                     <umlLook>true</umlLook>
20                     <callGraph>true</callGraph>
21                     <callerGraph>true</callerGraph>
22                     <generateLatex>true</generateLatex>
23                 </configuration>
24             </plugin>
25         </plugins>
26     </build>

The generateLatex option is nice if you wish to produce PDF files (for viewing on a Kindle for example).

With this plugin configured in you pom.xml, mvn doxygen:report is your workhorse.

If you’re unsure if the generated documentation is worth it, take a look at the doxygen documentation of JUnit 4.8.2 (zip file, 7.6MB).

Everybody writes legacy code.
— Eric Ries

Haskell Books and Tutorials

Haskell is really a language very worth knowing. It does many things so different than most other languages which I really enjoy.

I’d like to use this post to mention a few really good resources for learning Haskell:

Learn you a Haskell

Learn you a Haskell is a very fun and entertaining tutorial for Haskell very much in the spirit of Why’s poignant guide to Ruby.

It isn’t finished yet, but it’s really good to start with. (Via Wadler’s Blog)

Real World Haskell

Real World Haskell is a upcoming book I really look forward to. It is freely available on its website, so check it out.

Wikibook

There is also a good collection of Haskell topics on Wikibooks – Programming/Haskell

Wikibook is more interesting if you already know a bit of Haskell and would like to understand it more in depth.

Others

I liked Yet another Haskell Tutorial very much, this aims at people with a bit of background, though.

Another more recent book on Haskell is Programming in Haskell which is nice, but also quite basic IMO.

Of course there is always the Gentle introduction to Haskell

Enjoy.

Haskell is doomed to succeed.
— Sir Tony Hoare

ICFP Programming Contest

The ICFP Programming Contest is a little competition just before the International Conference on Functional Programming. Anyone can subscribe and attend. This year 364 teams took the challenge with their programming language of choice.

The contest is not tied to functional programming languages, you can choose whatever language you want. That’s the reason for this article. Which language has proven itself in this contest?

I’ll include the first, second and the Judge’s price here.

Year First Second Judge
1998 Cilk OCaml J http://www.ai.mit.edu/extra/icfp-contest/
1999 OCaml Haskell Haskell http://www.cs.virginia.edu/~jks6b/icfp/
2000 OCaml OCaml GML http://www.cs.cornell.edu/icfp/
2001 Haskell Dylan Erlang http://cristal.inria.fr/ICFP2001/prog-contest/
2002 OCaml C Python http://icfpcontest.cse.ogi.edu/
2003 C++ C++ Dylan http://www.dtek.chalmers.se/groups/icfpcontest/
2004 Haskell Haskell/C++ OCaml http://www.cis.upenn.edu/~plclub/contest/
2005 Haskell Dylan Dylan http://icfpc.plt-scheme.org/index.html

The results of the 2006 ICFP contest are not included since they weren’t published at the time of this writing.

To sum it up:

Language First Second Judge Sum
Haskell 3 2 1 6
OCaml 3 2 1 6
Dylan 0 2 2 4
C++++ 1 2 0 3
Cilk 1 0 0 1
C 0 1 0 1
J 0 0 1 1
GML 0 0 1 1
Erlang 0 0 1 1
Python 0 0 1 1

Interesting, isn’t it? There are few well known programming languages in this table. Haskell and OCaml both won six times! No Java, no Ruby, no Perl program made it to the top, what’s up with those languages (Again, the contest is not limited to functional programming languages…)?

What are the possible interpretations of such a result?

  1. Functional programming is pretty good (at least in Haskell and OCaml)
  2. The tasks are “functional-friendly” (at least we know that there are differences and you should choose your tools to suit your task)
  3. Only teams using obscure functional programming languages submit to ICFP (sorry, not true)
  4. ICFP isn’t able to run programs written in other languages
  5. Libraries don’t help you that much
  6. Only smart people learn obscure programming languages

Do you think your language is better than the other ones? Prove it, and sign up for the ICFP 2007 :-).

After more than 45 years in the field, I am still convinced that in computing, elegance is not a dispensable luxury but a quality that decides between success and failure;
Edsger Dijkstra

The Programmer's Bill of Rights

Jeff Atwood over at Coding Horror wrote The Programmer’s Bill of Rights I like the idea, but some points aren’t how I’d like to have them.

So here my own “Programmer’s Bill of Rights”.

Continue to full post...

Ten questions

Jaroslaw Rzeszótko sent a list of ten questions to some of the best known programmers in/on the net. It was a great idea, I love it. The questions are very basic, but what else would you ask? Anyway I started to answer them myself, so here they are…

Continue to full post...

Don't over-reuse

I’m no big fan of frameworks. This comes from bad experience with a framework drawn out of a specialized project. It had to be used in every project we had to implement whether it was appropriate or not. The argument was something like “it provides features you’d have to implement yourself, so it saves you time”. Usually these features would take about one hour to half a day to implement new — exactly fitting the problem. But we had to take all the clutter and deal with it. This was about six years ago, and as far as I know they still use this exact framework… .

Reuse is a good thing, we all know that, and we all draw our benefits from it. But you can certainly overdo it, not every project fits a specific framework.

Ben had a link to recordings of the last RailsConf (and insisted that I watch them) where Martin Fowler and David Heinemeier Hansson talked about what the essence of Rails really is. It does fit a few particular cases very well, but it doesn’t solve all problems for Web developers. Fowler’s talk is a bit more general on Software development, and DHH’s talk presents new ideas for the upcoming version of Rails, but starts with the philosophy and the goal as well as the non-goals of Rails. Both talks are highly recommended.

The reason I write about it is Bruce Eckel’s articel ‘When Reuse Goes Bad’.

This resulted in classic framework-itis. The contractor was no longer just trying to solve the customer’s problem, they were trying to solve all problems that looked like it. So for that reason alone the project would have required 10x the original time and money.

Read it, it’s a short one. I really felt affected by it.

Before software can be reusable it first has to be usable.
— Ralph Johnson

Why you should use a plain text editor for programming

I’m always impressed if someone codes like it’s all he or she has ever done, without IDE and still n-times faster than anyone with an IDE.

During the past few years I’ve done a lot of Java coding. I’ve used Eclipse most of the time, but until now I didn’t realize how little I really know about Java1 — my skills largely depended on my IDE of choice.

Most of the time I don’t have a clue where (in which package) a specific class lies, as well as I don’t know (even roughly) what exceptions are thrown. For these tasks I asked my IDE to generate the code I needed (import, try/catch or throws statements).

The last few days I used ANTLR to generate a parser. I didn’t use an IDE (there is none for ANTLR 2 available for free), which meant I had to refer to the Java API to check whether my imports are right, how this or that method is named exactly and which arguments it’ll take (what Eclipse has done so far. It is to be blamed that I only know approximately the first three characters of a method name2).

If we are coding for productivity, in our day job or where time matters more than sharpening our skills, it’s fine (and sometimes crucial) to use an IDE. But if we want to learn something and get better (smarter) at our tasks it actually does harm: we’ll end up thinking man that’s easy I must know pretty much everything about this or that5 where in fact we only know how to use our IDE and don’t know s##t about the language itself3.

Another dangerous thing with code generators is, that it seems pretty easy to get up and running with a mid-sized application. We write a bit, generate a lot and we’re ready to go, isn’t that so? No, only at the beginning! If we (or someone else) need to change something in that code, maybe weeks or months later we have to understand every bit of the generated code (Ben has written about that too), now we’re screwed. If the code looks messy why should anyone bother to do his/her best to make clean changes?

If it’s easy to copy and generate lots of code why should one bother and think about abstraction, generalization and stuff? If I had to write getters and setters by hand, I would do anything to avoid it. Maybe (just maybe…) I’d just start designing my programs, making a real O-O design (or write a generator to write them for me, but that’s another story).

The previous problem can easily be avoided by hiding the generating the code. In Lisp one writes macros to generate code at compile time, xDoclet generates code as it’s own step, EJB 3 generates code at compile time too.

Now, what’s the point? Use a plain text-editor4 for educational purposes and avoid generating too much code with your IDE on your day job.

1 Java in terms of packages structure, method names etc. Meta-concepts like algorithms, techniques, patterns etc. are out of question.

2 That’s a great example for a quote out of Donald Normans book The Design of Everyday Things Knowledge in the head, and in the world: Precise behavior can emerge from imprecise knowledge, and, of course his Things that make us Smart book. None of these things make us an expert… merely an user.

3 You can be a pretty smart programmer, but don’t know a lot about the used programming language. To be a true expert you have to dive into the language itself and don’t rely on an IDE.

4 There are many high quality editors out there. I like TextMate most, but Emacs, Vim and JEdit are quite good too (and free, but keep the learning curve in mind).

5 See Unskilled and Unaware of it (PDF, 500kb). Somewhat related: Being a specialist (PDF, 164kb).

The Brain: Pinky, are you pondering what I’m pondering?
Pinky: I think so Brain, but if you replace the P with an O, my name would be Oinky, wouldn’t it?

Ruby: future and delayed (lazy) variables

Future variables

Future variables are a very popular technique to implement dataflow execution. A future variable gets assigned with a (potentially lengthy) operation which will be executed in a separated thread. The assignment is therefore non-blocking.

As soon as the variable gets read, it blocks if the calculation isn’t ready, or it immediately returns the result.

I’ve implemented that feature using some Ruby tricks, but the finest and most elegant solution would be in Lisp using it’s unique macros.

Anyways, here a quick example:


x = RFuture.future do
   # some lengthy operation with an interesting result
end

Now, that assignment doesn’t block, and we could do some other things, later on we may want to know what the result of our operation is:


puts x

This code could block if the result isn’t ready.

As you can see, there is no special handling for reading the variable (meaning using adopting future variables later on is quite easy).

In the best case scenario the saved execution time is pretty high (compared to sequential processing):

Delayed (lazy) variables

Another concept introduced in this library are delayed or lazy variables. These are variables which get assigned a operation which isn’t executed until we really need the result. Seems pretty pointless, but it’s used in some functional programming languages (like Hasekll).


x = RFuture.delay do
   # some operation with an interesting result we may need later on
end

Again, reading the variable doesn’t differ from usual variables:


puts x

Download the file here.

From a programmer’s point of view, the user is a peripheral that types when you issue a read request.
— P. Williams

Productivity and programming languages

We all know that programming languages affect productivity. For example writing a word processor in Assembler is much harder than writing it in C++. If seems natural to have a table with all available programming languages with their corresponding productivity value.

The main problem is to compare these languages. A common method is counting the “lines of code” a programmer needs for a given program. But who is going to implement the same specification in hundreds of languages? So to make the test applicable we need a common concept for the comparison.

This concept is called “Function Point Analysis”. This approach was introduced by IBM for exactly this task. Here you define function points (there are a lot of rules for doing that) and measure how many function points a programmer implements per month. More information about function points: Function Point FAQ and Fundamentals of FPA.

For the productivity measurement of programming languages we count the lines of code per function point.

The languages are categorized into levels – Assembler for example has level 1 with 320 lines of code per function point. A Macro Assembler has level 1.5 with 213 lines of code per function point etc…

Here an excerpt of this list:

LanguageLevelLoC
Assembly (Basic)1.00320
Assembly (Macro)1.50213
C2.50128
C++6.0053
Haskell8.5038
JAVA6.0053
LISP5.0064
Objective-C12.0027
SMALLTALK 8015.0021

A lot of languages are special languages – like Excel with level 57. My guess is that a general purpose language is not able to go far beyond level 20.

Another table from here shows this:

LanguageLoC
Assembler320
Macro Assembler213
C150
Lisp64
Basic64
Objective – C26
Smalltalk21
Query Languages16
Spreadsheet Languages6

The values are the same or at least nearly the same for the languages (maybe they came from the same source?)

I would like to see a JavaScript, Ruby and Python included in these tables, but unfortunately I couldn’t find anything for these languages. I think JavaScript is no better than 10, Python and Ruby are together with Perl around 15, maybe Ruby has the highest level of these languages (Comments?).

The higher the language level the easier it is to read and change such a program. There is – of course – a trade off: the more code you write (for example in assembler) the more optimizations are possible. With increasing processor speed, and advanced algorithms this argument is getting more and more obsolete.

Programs must be written for people to read, and only incidentally for machines to execute.
— Abelson & Sussman, SICP, preface to the first edition

Lessons of game design

If you’re searching UML, Design Patterns or suchlike, you’re wrong here – this time I’ll try to cover some principles of puzzle design to general software from the point of the user:

Scott Kim, a well known Puzzle creator has written a great paper on Puzzle design called “The Puzzlemaker’s Survival Kit” (sorry, no link available).

He goes through principles, case studies and the design process of puzzle and game design. Some of these guides can (IMHO) applied to software design in general.

How? I think, puzzles in games and features in general software aren’t that different. Here the first principle of puzzle design written by Scott Kim:

He also writes:

bq.A puzzle is a problem that is fun to solve!
What makes puzzles different from problems is that they are fun to solve. A good puzzle is guaranteed to have a solution that is neither too easy nor too hard to find.

If we map puzzle to real world problem I want/have to solve we should design our software to allow this as easy as possible and powerful as needed. So, our software should enable the user to solve a problem with joy!

I wouldn’t limit “fun” to games – I think we could build software that every problem is fun to solve because the software helps us and doesn’t step in the way. It depends only on our imagination and our willing to invest a lot of time to build software that is fun to use. (Although game puzzles may be more fun to solve, but not as rewarding).

…neither too easy nor too hard… is another interesting phrase. Doesn’t that map directly to the power of any feature included into a piece of software? Conclusion: don’t treat your users as complete dumbs, design your features right – not too complex, but with enough power to perform a specific kind of task.

Scott Kim explains how to construct puzzles in games, let’s see if we can map these to software in general.


Think small.

Interesting, huh? Think small as advice! Isn’t that the same as Apples Think different – just take the iPod Shuffle or the Mac Mini? A lot of people try to get more and more features into a piece of software (Think big), but these systems tend to be overly complex and unusable (at least for me).


See how far you can get with the smallest number of features.

This is one of the most thrilling and adorable advices I’ve ever read! Don’t build battalions of special cases. Extract, refine and rewrite until you’ve got a truly elegant solution for the problems you want to solve. The web, meaning HTTP, follows exactly this advice. They striped out every special case and build a solid foundation for thousands of web applications by not disturbing or limiting the possibilities. See Axioms of Web architecture for more.

Many, if not all, of the brightest minds of history gave this advice – starting from Da Vinci Simplicity is the ultimate sophistication., Hoare The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay., Dijkstra Simplicity and elegance are unpopular because thy require hard work and discipline to achieve and education to be appreciated. or Simplicity is prerequisite for reliability.… .


Start with a tutorial.

Definitively a game design advice, or not? Eclipse, GMail and many more start with a tutorial – maybe we should think of including a quick and simple introduction into any given piece of software?


Make puzzles easy to author.

Construction puzzles need to be as easy and fun to author as they are
to solve.

This is a very underestimated advice in general software. Why do we think that we are the great masterminds and know every single case our users want to do with the software? There is no possible way for doing that with any given real world problem!

Open up your software and let users create features the need through an API, scripting capabilities or whatever.
Design these interfaces so, that no existing code has to be read to use the API, Interface or Scripting capabilities. Make it easy to build new functionality and don’t even try to lock in the user!

Even Microsoft allows users to build their own little programs within their office suite.

Another big advantage: You might use that feature yourself to build new features and customizations… .


Simplification!

Do it! Any feature you find “hard to use” is for the user simply unusable – your user doesn’t invest month or years learning (like you building) the software!

User interfaces which you – as the creator – find “super easy” are usually “just right” for your users.

[I’ve written this in a rush, so please excuse some weird phrases and formulations.]

The imagination of nature is far, far greater than the imagination of man.
— Richard Feynman

Repetition is the root of all evil

Yes, repetition is the root of all evil. Knuth (or was it Hoare?) said “Premature optimization is the root of all evil”, but I think it’s repetition. In fact repetition is often used to provide better performance (eg loop unrolling to suit your CPUs pipeline or the good old Duff’s device).

I’m talking about repetition in software – the typical copy and paste fault. Every programming book I’ve read so far deals in one or the other way with reducing repetition.

Think of patterns – each pattern tries to reduce the amount of code written by providing professional reuse mechanisms.
- Factory tries to concentrate the code for creating objects on one place,
- Strategy tries to supply a mechanism to traverse through interconnected objects,
- Facade lets an existing piece of code hide behind another interface.
- Singleton seems to be an exception because it reduces not the code replication rather the data replication.

In Test Driven Development the rules are simple:
- write a test (red bar)
- fake to get the bar green (unit tests are ok, when the “bar” is green)
- clean up and remove replications

Every programming language which doesn’t provide good facilities to reduce replication should be considered harmful! All functional languages have facilities to pass around functions, Python and Ruby introduced similar techniques, Java… well no, but you can simulate such a behavior with enormous amount of code (I don’t know C#, but I guess Anders Hejlsberg introduced something similar).

Whenever you look at a program or you’re writing one and you discover some kind of repetition: Remove it! Kill it! Destroy it… whatever but get rid of it!

Each time I introduced some sort of repetition and had to fix a bug in there, I forgot to fix the replica…

Now, what should we do to reduce replication? Sometimes it’s a bit hard to remove the replication. In the eighties the keyword was subclassing. Today subclassing is considered a performance killer and is sometimes referred to as hardwired code.
Now we have patterns – those magic constructs made by object gurus. I doubt that too many people do some kind of “pattern driven” design – those systems would get utterly complex and even harder to maintain than systems with a few repetitions.
So, whenever you encounter a repetition just remove it as best as you can, don’t hack! Think of a clean lean design and write some tests before changing something (just to be sure you didn’t break something).

That’s one of Extreme Programming’s weaknesses: there is no master plan the whole team adheres. Each pair of programmers implements their user stories and doesn’t know what the others do (unless the pairs are changed very (very) frequently). [Extreme Programming isn’t written in stone, every team can adopt it to it’s own needs – so maybe there are some who indeed have such a master-plan]

On the other hand repetition helps me to remember things, therefore:
Remove repetition in your software (and your friends software) as soon as you discover it!

If you optimize everything, you will always be unhappy.
— Donald Knuth