How not to use mocks

Test doubles

Before talking about mocks I want to define what is a mock. In everyday live we often use the term mock for any object replacing a real production object in a test. That is not correct and may cause confusion.

Let’s look at different kinds of test doubles (those replacement objects) on a simple example. Imagine we have a phone book object that allows to store and retrieve phone numbers.

public interface Phonebook {

  String setNumber(String name, String number);
  
  String getNumber(String name);

}

And also there is something else that uses Phonebook. It’s that something else we are testing, not Phonebook itself.

The real Phonebook implementation would probably save and retrieve number from a file or a database. It may be impossible or impractical to use the real implementation of this object in unit tests. We need to introduce a replacement.

Stubs

A stub always returns the same previously set value(s) and has no logic.

public class PhonebookStub {

  private final String number;
  
  public PhonebookStub(String number) {
    this.number = number;
  }

  String setNumber(String name, String number) {
    // do nothing
  }
    
  String getNumber(String name) {
    return number;
  }

}

As you can see PhonebookStub will always return the same number each time you ask. If you don’t ask at all it’s fine too.

Fakes

A fake contains a special implementation tailored for testing. Our fake is going to use an in-memory map to store and retrieve phone numbers.

public class PhonebookFake {

  private final Map<String, String> numbers = new HashMap<>;
  
  String setNumber(String name, String number) {
    numbers.put(name, number)
  }
    
  String getNumber(String name) {
    return numbers.get(name);
  }

}

In this example the fake implementation is very simple, but sometimes creating a fake requires a significant amount of coding (e.g. an in-memory SQL database).

Mocks

A mock allows to set expectations about how it should be called together with the answers expected. Typically mocks are created using a mocking framework such as EasyMock or Mockito.

import static org.mockito.Mockito.*;

// create a mock
Phonebook phonebook = mock(Phonebook.class);
// set expectations
when(phonebook.getNumber("Alice")).thenReturn("1234567890");

...

// verify expectations (optional)
verify(phonebook).getNumber("Alice");

If you find yourself creating a mock class (e.g. PhonebookMock), most likely what you are really doing is creating a stub or a fake.

Why mocks are bad?

They are not. Mocking is a great tool when used appropriately. But too much mocking makes tests hard to read and maintain. Somehow the presence of a mocking framework on the classpath makes people go crazy and mock everything without much thinking. Here are some typical consequences of mocking overused:

  • Hard to read, noisy tests. Modern mocking frameworks provide pretty nice DSLs. But still they are limited by the syntax of the language used and are not perfect. A lot of expectation statements and behavior verifications increase test methods size and distract from testing logic. It gets much worse when advanced techniques such as ArgumentCaptor are used.
  • Tight coupling of test code with the implementation being tested. Unless you do behavioural testing you want your tests to stay the same as you change the implementation. That is one of the benefits of having tests - you refactor the implementation and if the tests still pass you can be confident you didn’t break anything. With mocks used for testing you may end up almost repeating your implementation logic in your tests as you set all expected calls one by one. It’s almost certain that your tests will fail if you change implementation. You lose the refactoring safety net and increase maintenance cost.
  • Compromised implementation quality. This one is not quite obvious. How can the choice of testing tools impact your production code? It is well known that one side effect of testing is a better design of your production code. To make code testable you need to break it into separate modules (e.g. classes and functions), introduce some reasonable abstraction layers, organize dependencies between components. You want to do this anyway, but testing reinforces it. Now, going back to mocking. This tool is almost too powerful. It allows you to write tests for any kind of messy code. Given that nowadays you can mock classes and not only interfaces, create partial mocks and even mock static methods.

Less mocking

Now I want to talk about some typical strategies to use less mocking. Again I’m not saying you should completely stop using mocks. I just want to suggest strategies that will make testing simpler (and simple tests do not need much mocking).

To illustrate these strategies I created an interface to fetch stock price quotes given a stock symbol.

public interface Ticker {

  double getQuote(String symbol) throws IOException;
  
}

We are going to have a simple implementation as well. It is absolutely not the way to implement this kind of functionality in production code. I just wanted to show a more or less real life example.

public class SimpleTicker implements Ticker {

  private final HtmlCleaner htmlCleaner;

  public SimpleTicker(HtmlCleaner htmlCleaner) {
    this.htmlCleaner = htmlCleaner;
  }

  @Override
  public double getQuote(String symbol) throws IOException {
    URL url = new URL("http://finance.yahoo.com/q?s=" + symbol);
    String html = IOUtils.toString(url);
    TagNode rootNode = htmlCleaner.clean(html);
    TagNode tickerNode = rootNode.findElementByAttValue("class", "time_rtq_ticker", true, true);
    String text = tickerNode.getText().toString();
    return Double.parseDouble(text);
  }
}

The getQuote method is rather simple, but has some logic to test. Let’s create a test for this method using the most direct approach with mocking. From my past experience this is what a lot of software developers will actually do.

import static org.mockito.Mockito.mock;
import static org.mockito.Mockito.when;
import static org.powermock.api.mockito.PowerMockito.mockStatic;

@RunWith(PowerMockRunner.class)
@PrepareForTest(IOUtils.class)
public class SimpleTickerTest {

  @Test
  public void getQuote() throws Exception {
    // create mocks
    mockStatic(IOUtils.class);
    HtmlCleaner htmlCleaner = mock(HtmlCleaner.class);
    TagNode rootNode = mock(TagNode.class);
    TagNode tickerNode = mock(TagNode.class);
    // set expectations
    URL url = new URL("http://finance.yahoo.com/q?s=LNKD");
    String html = "<html><span class='time_rtq_ticker'><span>116.25</span></span></html>";
    when(IOUtils.toString(url)).thenReturn(html);
    when(htmlCleaner.clean(html)).thenReturn(rootNode);
    when(rootNode.findElementByAttValue("class", "time_rtq_ticker", true, true))
        .thenReturn(tickerNode);
    when(tickerNode.getText()).thenReturn("116.25");
    // run actual code
    SimpleTicker ticker = new SimpleTicker(htmlCleaner);
    double quote = ticker.getQuote("LNKD");
    assertEquals(116.25, quote, 0.1);
  }
}

The resulting test code if rather bad and suffers from all issues of mocking described above. What can we do to make it better?

Separate logic and side effects

This one is my favorite. It goes back to pure functions and functional programming.

The basic idea is to break the code under test into pure functions and functions with side effects. This makes testing so much easier because pure functions do not need mocking. Let’s rewrite the ticker class.

public class GoodTicker implements Ticker {

  private final HtmlCleaner htmlCleaner;

  public GoodTicker(HtmlCleaner htmlCleaner) {
    this.htmlCleaner = htmlCleaner;
  }

  @Override
  public double getQuote(String symbol) throws IOException {
    URL url = constructUrl(symbol);
    String html = fetchUrl(url);
    return parseHtml(html);
  }

  @VisibleForTesting
  URL constructUrl(String symbol) throws MalformedURLException {
    return new URL("http://finance.yahoo.com/q?s=" + symbol);
  }

  @VisibleForTesting
  double parseHtml(String html) {
    TagNode rootNode = htmlCleaner.clean(html);
    TagNode tickerNode = rootNode.findElementByAttValue("class", "time_rtq_ticker", true, true);
    String text = tickerNode.getText().toString();
    return Double.parseDouble(text);
  }

  @VisibleForTesting
  String fetchUrl(URL url) throws IOException {
    return IOUtils.toString(url);
  }
}

As you can see above it is the same exact code but broken into more fine grained methods. That gives us the ability to test all these methods separately. Which is a good thing because they represent different areas of logic. Thus the resulting tests are not only simpler but they are better too.

Here is how the quote page URL creation logic test may look like.

  @Test
  public void constructUrl() throws Exception {
    GoodTicker ticker = new GoodTicker(null);
    URL url = ticker.constructUrl("LNKD");
    assertEquals(new URL("http://finance.yahoo.com/q?s=LNKD"), url);
  }

As you can see the testing code is really simple and requires absolutely no mocking.

One may say that my production code is shaped by the way it is tested. I think it is shaped in a good way. A good code is a code that is easy to read. And this new rewritten code is much easier to read and understand.

Another argument could be that I exposed some private methods to make the class more testable. This is true, but should not be a real problem because all clients access this class by the Ticker interface. Additionally this issue is mitigated by using the @VisibleForTesting annotation. The annotation together with a Findbugs detector guarantees that the newly exposed methods are not called outside of the testing scope.

Use real classes where appropriate

Instead of creating mocks real objects can be used as long as they are fast and have no side effects. In our example it is totally fine to use a real instance of HtmlCleaner to test HTML parsing logic.

  @Test
  public void parseHtml() throws Exception {
    GoodTicker ticker = new GoodTicker(new HtmlCleaner());
    String html = "<html><span class='time_rtq_ticker'><span>116.25</span></span></html>";
    double quote = ticker.parseHtml(html);
    assertEquals(116.25, quote, 0.1);
  }

Do not test everything

It turns out that after extracting all logic into separate methods and testing them independently there is often nothing else left to test. Our main method getQuote only contains high level sequence of operations. It is so simple that requires no testing.

To make a decision regarding what is simple enough to have no tests we need to go back to the reasons we write unit tests at all. I think it boils down to catching bugs (present or future). What kind of bugs / mistakes do we expect to catch in a method as simple as the new refactored getQuote? I’d say it’s unlikely to have problems there. It is simple enough to leave it untested. Not testing it does not decrease overall test coverage.

Whether some code needs testing or not is a judgement call we need to make each time. What is better to create a complicated test or leave a piece of simple code untested? There is no universal answer.

Let’s say I did not convince you and you still want getQuote to be tested. There are two ways we can go about it:

  1. Create an anonymous class extending the class under test and override the side-effect method.
  2. Create a partial mock changing the behavior of the side-effect method.

Please see both methods implemented below.

  @Test
  public void getQuoteOverriding() throws Exception {
    URL url = new URL("http://finance.yahoo.com/q?s=LNKD");
    String html = "<html><span class='time_rtq_ticker'><span>116.25</span></span></html>";
    GoodTicker ticker = new GoodTicker(new HtmlCleaner()) {
      @Override
      String fetchUrl(URL actualUrl) throws IOException {
        assertEquals(url, actualUrl);
        return html;
      }
    };
    double quote = ticker.getQuote("LNKD");
    assertEquals(116.25, quote, 0.1);
  }

  @Test
  public void getQuoteMocking() throws Exception {
    // create a partial mock
    GoodTicker ticker = spy(new GoodTicker(new HtmlCleaner()));
    // set expectations
    URL url = new URL("http://finance.yahoo.com/q?s=LNKD");
    String html = "<html><span class='time_rtq_ticker'><span>116.25</span></span></html>";
    when(ticker.fetchUrl(url)).thenReturn(html);
    // run actual code
    double quote = ticker.getQuote("LNKD");
    assertEquals(116.25, quote, 0.1);
  }

Both approaches are fine and pretty much identical. Depending on how comfortable is your team with magic of mocking frameworks you can pick one or another.

The important part to notice here is that these tests duplicate existing tests for individual methods (same assertions). This is the reason I suggested earlier that they are redundant.

Create stubs/fakes for infrastructure components

It is beneficial to create stubs or fakes for infrastructural components that are reused a lot in the project. This will save you from duplicated code setting up mocks in each test involving those components.

This strategy can not be used to test the ticker class itself. But we can create a stub for it instead (assuming this ticker is used in many other parts of the project).

public class StubTicker implements Ticker {

  private final Map<String, Double> quotes;

  public StubTicker(String symbol, double quote) {
    this(ImmutableMap.of(symbol, quote));
  }

  public StubTicker(Map<String, Double> quotes) {
    this.quotes = quotes;
  }

  @Override
  public double getQuote(String symbol) throws IOException {
    Double quote = quotes.get(symbol);
    if (quote == null) {
      throw new IOException("No quote is defined for symbol " + symbol);
    }
    return quote;
  }
}

With this class testing other classes that use a ticker requires no mocking.

  @Test
  public void testSomething() throws Exception {
    Ticker ticker = new StubTicker("LNKD", 116.25);
    Something smthg = new Something(ticker);
    ...
  }

Outro

As a reminder, a quote from Mockito web site:

  • Do not mock types you don’t own
  • Don’t mock value objects
  • Don’t mock everything
  • Show love with your tests!

Writing faster functional tests for Play applications

Play Framework supports functional testing out of the box. There are helpers for both Scala (specs2, ScalaTest) and Java (JUnit). The basic idea is to run code under test inside a “fake” application. For code it will look like it is running in a normal Play application with access to plugins, configuration parameters and other parts of the runtime environment. A fake application can be started with or without a HTTP server. Everything works very well, but there is one issue - instantiation of a fake application takes a long time. The reason is the fake application is in fact quite real and does a lot of what a real application would do (read the configuration, load the plugins etc). Fake application startup time matters because the helpers provided by Play assume that each test requires it’s own application. As the number of functional tests grows the fake application start time overhead becomes significant. This concern is somewhat addressed in ScalaTest (multiple tests can share the same fake application), but not for the other testing frameworks.

In this post I will show a different approach to the functional testing of Play applications. Instead of using multiple fake applications we will run tests against a single instance of a real application.

Let’s start from a simple play application that we can use for testing;

package net.yefremov.sample

import play.api.mvc._

object Application extends Controller {

  def foo = Action {
    Ok("foo")
  }

  def bar = Action {
    Ok("bar")
  }
}

And here is the corresponding routes file:

GET     /foo                    net.yefremov.sample.Application.foo
GET     /bar                    net.yefremov.sample.Application.bar

Now we have a very simple application that returns “foo” on /foo and “bar” on /bar. Let’s create a functional test for that.

Before we create a test we need to change the build to support functional testing. That is not strictly required, but I prefer to clearly separate unit and functional tests. In SBT it is typically done using the “it” configuration. To get it working modify your build.sbt to contain the following items:

Defaults.itSettings

unmanagedSourceDirectories in IntegrationTest <<=
    (baseDirectory in IntegrationTest)(base =>  Seq(base / "it")),

libraryDependencies +=
    "com.typesafe.play" %% "play-test" % play.core.PlayVersion.current % "it",

lazy val root = project.in(file(".")).configs(IntegrationTest)

With the above changes applied we can have unit tests in the default test folder and functional test in the it folder. To execute functional tests you can run play it:test.

Now let’s create a simple functional test.

@RunWith(classOf[JUnitRunner])
class IntegrationSpec extends Specification with FutureAwaits with DefaultAwaitTimeout {

  val baseUrl = "http://localhost:9000"

  "application" should {

    "return 'foo' from /foo" in {
      val response = await(WS.url(s"$baseUrl/foo").get())
      response.body must beEqualTo("foo")
    }

    "return 'bar' from /bar" in {
      val response = await(WS.url(s"$baseUrl/bar").get())
      response.body must beEqualTo("bar")
    }
  }
}

The test uses WS API to hit the application via the HTTP interface. One important part to notice is that it does not use WithServer to start a fake application. It requires you to start the application yourself before running the test. Thus the test will fail if executed by simply running play it:test.

In order to make test pass we need to start our application before running test and shut the application down after test finishes. It could be done manually, but there is a better way. There are very convenient hooks available in SBT: sbt.Tests#Setup and sbt.Tests#Cleanup. We will use them to start and stop the application when running integration tests. We can see how it works by adding the following to build.sbt:

testOptions in IntegrationTest += Tests.Setup(() => println("setup"))

testOptions in IntegrationTest += Tests.Cleanup(() => println("cleanup"))

Now when you run play it:test you will see the messages above printed in the console. Next we need to replace simple debug messages with real code to start and stop the application. There is just one issue with that. We can not directly run play run using ProcessBuilder. That will block execution of tests until the app shuts down. The easiest solution for that is to use Unix screen command.

"screen -dmSL playFunctionalTest play run".run()

To stop the application under test we will find the corresponding screen and kill it.

"screen -S playFunctionalTest -X quit".run()

This is pretty much it. One last part that is left is to wait for the application to start up before executing tests. It can be done by hitting an application URL in a loop until the application responds. This will also help to warm up the application because Play only compiles code after the first request hits the application.

private def isAppRunning(appUrl: URL): Boolean = {
  try {
    val connection = appUrl.openConnection().asInstanceOf[HttpURLConnection]
    connection.setRequestMethod("GET")
    connection.connect()
    true
  } catch {
    case NonFatal(e) =>
      println(s"${e.getClass.getSimpleName}: ${e.getMessage}")
      false
  }
}

After putting everything together we can run play it:test and see our tests passing.

[play-functional-testing] $ it:test
Launching the app...
screen -dmSL playFuncTest play run -Dhttp.port=9000
Waiting for the app to start up...
ConnectException: Connection refused
Waiting for the app to start up...
ConnectException: Connection refused
Waiting for the app to start up...
ConnectException: Connection refused
Waiting for the app to start up...
The app is now ready
[info] IntegrationSpec
[info] application should
[info] + return 'foo' from /foo
[info] + return 'bar' from /bar
[info] Total for specification IntegrationSpec
[info] Finished in 18 ms
[info] 2 examples, 0 failure, 0 error
Killing the app...
screen -S playFuncTest -X quit
Waiting for the app to shutdown...
Waiting for the app to shutdown...
ConnectException: Connection refused
[info] Passed: Total 2, Failed 0, Errors 0, Passed 2
[success] Total time: 22 s, completed Dec 12, 2015 2:05:22 PM

Complete sample application code can be found here: https://github.com/dmitriy-yefremov/play-functional-testing

This approach may or may not be useful for a certain project. Below are some key features that will help to decide whether you should use it:

  • Execution time does not depend on the test suite size. It stays near your application startup time. That is good for large test suites. For relatively small test suites the overhead of starting up an application instance may be too big.
  • A real application instance is started. That is good to verify integration of all components including configuration files. That may be bad because test doubles can not be easily injected. You may need to have test only branches in your production code.
  • Tests only interact with the application through the HTTP interface. That makes it more suitable for black-box like acceptance testing. Tests do not need to be changed when the implementation is changed. Different team may be responsible for testing and implementation.

Please let me know what you think and how this solution can be improved!

Batch API for Play

While we are waiting for HTTP/2 to be widely adopted, there is a simple trick that can make our applications faster - batch API. It allows clients to encode multiple API calls into one HTTP request. Here are some examples of different batch API implementations: Facebook, Google, Dropbox

The idea is that instead of making multiple HTTP requests to get different pieces of data the client just makes one. This one request contains information about the different real endpoints requested. And the response from the server contains the individual responses combined. This approach can make the client faster because it reduces the overhead of multiple HTTP requests (TCP connection time, SSL handshake time, sequential execution due to a limit on the number of concurrent connection to the same host).

Batching of API requests is very easy to implement on top of the Play Framework. The key feature that enables us to do it is that application code has access to the global router. That makes it possible to receive a batch request, extract individual calls encoded into it, create fake HTTP requests for them and ask the router to process these fake requests.

For this post I chose to implement a Facebook API inspired request batching protocol. Let’s say there are multiple endpoints returning JSON responses. The goal is to create a batch endpoint that takes a list of individual endpoints in the query parameters and returns JSON containing responses from all of them. For example there are endpoints /foo and /bar. A call to /batch?f=/foo&b=/bar should return { "f": <foo resonse>, "b": <bar response> }. In the batch call query parameter names are used to give names to the sections of the resulting JSON document.

Let’s start from the top level batch controller action. It defines the high level algorithm: extract batched calls, fetch them individually, combine into the response.

def batchGet(): Action[AnyContent] = Action.async { implicit request =>
  val resultFutures = request.queryString.map { case (name, values) =>
    fetch(values.head).map(name -> _)
  }
  Future.sequence(resultFutures).map(combineResults)
}

The next function is the most important part - fetching an individual request locally. It creates a fake request using the given URL, routes to the corresponding action and invokes the action to produce a response.

private def fetch(path: String)(implicit request: RequestHeader): Future[Result] = {
  val fetchRequest = request.copy(path = path, uri = path)
  val handler = Play.current.global.onRouteRequest(fetchRequest)
  handler.map {
    case action: EssentialAction => action(fetchRequest).run
    case x => Future.failed(new IllegalArgumentException(s"Unexpected handler type"))
  } getOrElse {
    Future.failed(new IllegalArgumentException(s"No handler for path '$path'"))
  }
}

The last part is combining individual responses into the final JSON document. Responses are assumed to be valid JSON documents, so no validation is done.

private def combineResults(results: Iterable[(String, Result)]): Result = {

  def bytesEnumerator(s: String) = Enumerator(s.getBytes)
  def openBrace = bytesEnumerator("{")
  def closeBrace = bytesEnumerator("}")
  def comma = bytesEnumerator(",")
  def namedBlock(name: String) = bytesEnumerator(s""""$name":""")
  def isLast(index: Int) = index == results.size - 1

  val body = results.zipWithIndex.foldLeft(openBrace) { case (acc, ((name, result), index)) =>
    acc
      .andThen(namedBlock(name))
      .andThen(result.body)
      .andThen(
        if (isLast(index)) {
          closeBrace
        } else {
          comma
        }
      )
  }
  Result(ResponseHeader(OK), body)
}

Some improvements to the code above would be:

  • support for HTTP methods other than GET
  • better error handling (what if one of the batched requests fails, but the rest of them succeed)
  • better batching protocol (e.g. send not only the body, but also the headers and the status code for individual responses)

Full source code of the batch controller together with a sample application available here. Please check out and let me know what you think.

3 approaches to Scala code generation

There are two general ways to generate code: string templates and abstract syntax tree building. In Scala world we have a couple of options to build an abstract syntax tree. So, in this post I’m going to compare three different approaches:

  1. Generating code using string templates (Twirl).
  2. Generating code from an abstract syntax tree (treehugger).
  3. Building an abstract syntax tree and sending it directly to the compiler (Scala macros).

To do an apple-to-apple comparison I created a sample project implementing all three approaches. The project contains a primitive data type schema and three different code generators producing classes for the given data schemas. You can find the full source code in the GitHub repo.

Here is the data schema definition. It lets you to define a type, give it a name and provide the list of fields.

case class TypeSchema(name: TypeName, comment: String, fields: Seq[Field])

case class TypeName(fullName: String) {

  def packageName: String

  def shortName: String

}

case class Field(name: String, valueType: TypeName)

Below is a sample schema definition.

{
  "name": {
    "fullName": "net.yefremov.sample.codegen.Foo"
  },
  "comment": "Test schema to play with the generators",
  "fields": [
    {
      "name": "bar",
      "valueType": {
        "fullName": "String"
      }
    },
    {
      "name": "baz",
      "valueType": {
        "fullName": "Int"
      }
    }
  ]
}

And here is a corresponding class to be generated. Please note an extra method returning the source schema. It is added to have an example of methods generation.

/**
 * Test schema to play with the generators
 */
case class Foo(bar: String, baz: Int) = {

  def schema: String = """{"name":{"fullName":"net.yefremov.sample.codegen.Foo"},"comment":"Test schema to play with the generators","fields":[{"name":"bar","valueType":{"fullName":"String"}},{"name":"baz","valueType":{"fullName":"Int"}}]}"""

}

1. String templates

This is a very simple, lo-tech approach to generate Scala (or any other language) source code. It comes down to just a bunch of printf statements. To make things a little cleaner we can use more advanced template engines. For this project I chose Twirl. It is Play Framework template engine. The main reason to prefer Twirl over any other template engine is that it plays very well with Scala. The syntax is Scala-ish and you can put pieces of Scala code directly into your templates (similar to Java blocks in JSP).

Below is the Twirl based code generator implementation.

@(schema: net.yefremov.sample.codegen.schema.TypeSchema)

@import _root_.net.yefremov.sample.codegen.template.TwirlGenerator
@import _root_.net.yefremov.sample.codegen.schema.TypeSchema

package @schema.name.packageName

/**
 * @schema.comment
 */
case class @(schema.name.shortName) (
    @for((field, index) <- schema.fields.zipWithIndex) {
        @field.name: @field.valueType.fullName @if(index < schema.fields.size - 1) { , }
    }
) {

  def schema: String = "@TypeSchema.toEscapedJson(schema)"

}

As you can see it is quite straightforward. You first need to handcraft the desired output and then replace configurable pieces with template placeholders. These placeholders are replaced with the actual data when the template is evaluated.

The template looks just like the output and anyone can tell what is going on without any previous knowledge about Twirl. Probably the only cumbersome part is the field parameters loop. It takes too much efforts to not include a comma after the last parameter. It could be simplified with `fields.map(…).mkString(“, “), but I decided to show more of Twirl’s native syntax.

Pros

  • Does the job. There are successful projects using this approach.
  • Very easy to get started, almost no learning curve.

Cons

  • Can get quite messy if there is too much logic stuffed into one template.
  • No knowledge about the syntax of the generated code (can easily produce code that does not compile).

2. Abstract syntax tree to source code

An abstract syntax tree (AST) is a simplified syntactic representation of the source code. An AST is the output of the syntax analysis phases of a compiler and the input to the semantic analysis phase.

An AST is a much more structured matter then the source code. That may simplify the task of code generation. Instead of generating actual source code we can generate its AST. And then either convert it back into the source code or proceed with compilation.

To generate source from an AST I used a library called treehugger. It is a fork of scalac code with some extensions.

Below is a treehugger based code generator implementation.

class TreehuggerGenerator {

  def generate(schema: TypeSchema): String = {
    // register new type
    val classSymbol = RootClass.newClass(schema.name.shortName)

    // generate list of constructor parameters
    val params = schema.fields.map { field =>
      val fieldName = field.name
      val fieldType = toType(field.valueType)
      PARAM(fieldName, fieldType): ValDef
    }

    // generate class definition
    val tree = BLOCK(
      CASECLASSDEF(classSymbol).withParams(params).tree.withDoc(schema.comment) :=  BLOCK(
        DEF("schema", StringClass) := LIT(TypeSchema.toJson(schema))
      )
      ).inPackage(schema.name.packageName)

    // pretty print the tree
    treeToString(tree)
  }

  private def toType(fieldType: TypeName): Type = {
    fieldType.fullName match {
      case "String" => StringClass
      case "Int" => IntClass
      case "Boolean" => BooleanClass
    }
  }

}

The code is somewhat complicated. Thanks to the great documentation on the project’s web site it didn’t take me too long to code this example. But after implementing a working generator I’m still not sure I fully understand the code I wrote. Every individual part of the code looks simple and reasonable. But it’s hard to understand bigger pieces of code using treehugger.

Another thing that made me struggle with is IntelliJ not being able to compile the sample code and giving me different errors. This is definitely an IntelliJ’s issue, as SBT compiled everything just fine. But development experience becomes much less pleasant without all the aids of a modern IDE.

Pros

  • The library takes care of Scala syntax (e.g. you don’t need to explicitly escape strings).
  • The generated code is supposed to compile (you can’t generate syntactically invalid code).
  • The code is written in Scala (that is beneficial for complex projects with a lot of logic, you can easily express any kind of conditions, extract reusable blocks, compose things and so on).

Cons

  • Difficult initial learning process.
  • The code is quite complicated.
  • Not much support from the IDE.

3. Abstract syntax tree to byte code

As mentioned earlier an AST can be passed directly to the next compilation phase instead of generating source code out of it. Strictly speaking this is not a Scala code generation method. There is no Scala source code produced. But in many cases the source code itself is not important, the compiled output is what really needed. In these cases Scala macros can be used.

Scala macros are functions that are called by the compiler during the compilation process. They have access to the AST and can manipulate it (including appending to it). There used to be two kinds of macros: def and type. The first one is used to generate functions, the second one - new types. As of August 2013 type macros are deprecated. The closest alternative that could be used for the purpose of this comparison is macro annotations.

Below is a macro annotations based code generator implementation.

class FromSchema(schemaFile: String) extends StaticAnnotation {

  def macroTransform(annottees: Any*) = macro QuasiquotesGenerator.generate

}


object QuasiquotesGenerator {

  def generate(c: Context)(annottees: c.Expr[Any]*) = {

    import c.universe._

    // retrieve the schema path
    val schemaPath = c.prefix.tree match {
      case Apply(_, List(Literal(Constant(x)))) => x.toString
      case _ => c.abort(c.enclosingPosition, "schema file path is not specified")
    }

    // retrieve the annotate class name
    val className = annottees.map(_.tree) match {
      case List(q"class $name") => name
      case _ => c.abort(c.enclosingPosition, "the annotation can only be used with classes")
    }

    // load the schema from JSON
    val schema = TypeSchema.fromJson(schemaPath)

    // produce the list of constructor parameters (note the "val ..." syntax)
    val params = schema.fields.map { field =>
      val fieldName = newTermName(field.name)
      val fieldType = newTypeName(field.valueType.fullName)
      q"val $fieldName: $fieldType"
    }

    val json = TypeSchema.toJson(schema)

    // rewrite the class definition
    c.Expr(
      q"""
        case class $className(..$params) {

          def schema = ${json}

        }
      """
    )
  }

}

The implementation consists of two parts: the annotation that is used as a marker in the source code to trigger macro expansion and the code generator itself. You could expect the generator implementation here to look just like the one based on treehugger. But to make things even more confusing interesting I used a relatively new Scala feature called quasiquotes. It lets you write snippets of code that get automatically converted into the corresponding syntax trees. You can also use $-placeholders that are lifted by the compiler. So the code looks very much like a string template but with all the goodies of type safety and syntax checking.

Here is an example of how to use the generator.

@FromSchema("sample/src/main/resources/Foo.json")
class Foo

Pros

  • Official Scala tool to generate code.
  • Quite readable AST code with quasiquotes.
  • Takes care of Scala syntax (e.g. you don’t need to explicitly escape strings).
  • The generated code is supposed to compile (you can’t generate syntactically invalid code).
  • The code is written in Scala (that is beneficial for complex projects with a lot of logic, you can easily express any kind of conditions, extract reusable blocks, compose things and so on).

Cons

  • Quasiquotes expressions syntax does not always match normal Scala.
  • Macro annotations require a stub class to be rewritten (you have to have something to annotate).
  • Difficult initial learning process.
  • The code is quite complicated.
  • Not much support from the IDE.

Summary

After playing with all three code generator implementations I’d say that string templates based approach is my favorite. For not too complicated projects it should be enough. For a really complicated code generator with a lot of logic and code reuse I’d look at AST generation. Both treehugger and Scala macros are great projects. Treehugger may be a cleaner solution to generate completely new types. Scala macros is great at rewriting/augmenting existing code.

Please check out the sample project and let me know what you think.

Debugging Scala compiler's magic

Have you ever wondered how is Scala able to do some clever trick on top of JVM? Or maybe you have asked yourself “WTF is wrong” while starring at a piece of code? That is what happened to me yesterday. I was trying to figure out some magic about case classes and pattern matching. And even worse I wanted to replicate some of it myself. After spending hours googling I was ready to start decompiling classes generated by the Scala compiler to understand what exactly is going on there. But it turned out there is a simpler way to get into the internals.

The Scala compiler has multiple phases in the process of turning your beautiful source into byte code. Here are these phases:

$ scala -Xshow-phases
namer, typer, superaccessors, pickler, refchecks, liftcode, uncurry, tailcalls, explicitouter, erasure, lazyvals, lambdalift, constructors, flatten, mixin, cleanup, icode, inliner, closelim, dce, jvm, sample-phase

You can find more information about each of them on this wiki page.

What is interesting for us now is that we can make the compiler to print intermediate results in between the phases. So we can see how the code evolves from it’s initial look to the final result.

To make the compiler to print the syntax trees after a certain phase add -Xprint:<phase> to the scalac command line. For example, it would be -Xprint:namer for the namer phase.

I created a simple test to experiment with the case classes magic that was mentioned earlier. Here is the source code.

package net.yefremov.sample

case class Container(value: Any)

object MatchingTest extends App {

  def printType(container: Container): Unit = {
    container match {
      case Container(stringValue: String) => println(s"It is a string: $stringValue")
      case Container(intValue: Int) => println(s"It is an int: $intValue")
    }
  }

  printType(Container("Heya!"))
  printType(Container(42))

}

What if it is an SBT project? How do I get to scalac to add this -Xprint parameter? One way is to set scalacOptions in Build.scala/build.sbt. That is not very convenient if you just want to check something quickly and move on. The other approach is to enable this feature only for the current sbt session:

$ sbt
> set scalacOptions ++=Seq("-Xprint:namer")

After that when you run compile you will get your syntax trees dumped into the console. Here is the output for my test after the phase namer.

[[syntax trees at end of                     namer]] // MatchingTest.scala
package net.yefremov.sample {
  case class Container extends scala.Product with scala.Serializable {
    <caseaccessor> <paramaccessor> val value: Any = _;
    def <init>(value: Any) = {
      super.<init>();
      ()
    }
  };
  object MatchingTest extends App {
    def <init>() = {
      super.<init>();
      ()
    };
    def printType(container: Container): Unit = container match {
      case Container((stringValue @ (_: String))) => println(StringContext("It is a string: ", "").s(stringValue))
      case Container((intValue @ (_: Int))) => println(StringContext("It is an int: ", "").s(intValue))
    };
    printType(Container("Heya!"));
    printType(Container(42))
  }
}

And here is the output after the phase lambdalift.

[[syntax trees at end of                lambdalift]] // MatchingTest.scala
package net.yefremov.sample {
  case class Container extends Object with Product with Serializable {
    <caseaccessor> <paramaccessor> private[this] val value: Object = _;
    <stable> <caseaccessor> <accessor> <paramaccessor> def value(): Object = Container.this.value;
    def <init>(value: Object): net.yefremov.sample.Container = {
      Container.super.<init>();
      Container.this.$asInstanceOf[Product$class]()./*Product$class*/$init$();
      ()
    };
    <synthetic> def copy(value: Object): net.yefremov.sample.Container = new net.yefremov.sample.Container(value);
    <synthetic> def copy$default$1(): Object = Container.this.value();
    override <synthetic> def productPrefix(): String = "Container";
    <synthetic> def productArity(): Int = 1;
    <synthetic> def productElement(x$1: Int): Object = {
      case <synthetic> val x1: Int = x$1;
      (x1: Int) match {
        case 0 => Container.this.value()
        case _ => throw new IndexOutOfBoundsException(scala.Int.box(x$1).toString())
      }
    };
    override <synthetic> def productIterator(): Iterator = runtime.this.ScalaRunTime.typedProductIterator(Container.this);
    <synthetic> def canEqual(x$1: Object): Boolean = x$1.$isInstanceOf[net.yefremov.sample.Container]();
    override <synthetic> def hashCode(): Int = ScalaRunTime.this._hashCode(Container.this);
    override <synthetic> def toString(): String = ScalaRunTime.this._toString(Container.this);
    override <synthetic> def equals(x$1: Object): Boolean = Container.this.eq(x$1).||({
  case <synthetic> val x1: Object = x$1;
  case5(){
    if (x1.$isInstanceOf[net.yefremov.sample.Container]())
      matchEnd4(true)
    else
      case6()
  };
  case6(){
    matchEnd4(false)
  };
  matchEnd4(x: Boolean){
    x
  }
}.&&({
      <synthetic> val Container$1: net.yefremov.sample.Container = x$1.$asInstanceOf[net.yefremov.sample.Container]();
      Container.this.value().==(Container$1.value()).&&(Container$1.canEqual(Container.this))
    }))
  };
  <synthetic> object Container extends runtime.AbstractFunction1 with Serializable {
    def <init>(): net.yefremov.sample.Container.type = {
      Container.super.<init>();
      ()
    };
    final override <synthetic> def toString(): String = "Container";
    case <synthetic> def apply(value: Object): net.yefremov.sample.Container = new net.yefremov.sample.Container(value);
    case <synthetic> def unapply(x$0: net.yefremov.sample.Container): Option = if (x$0.==(null))
      scala.this.None
    else
      new Some(x$0.value());
    <synthetic> private def readResolve(): Object = sample.this.Container;
    case <synthetic> <bridge> def apply(v1: Object): Object = Container.this.apply(v1)
  };
  object MatchingTest extends Object with App {
    def <init>(): net.yefremov.sample.MatchingTest.type = {
      MatchingTest.super.<init>();
      MatchingTest.this.$asInstanceOf[App$class]()./*App$class*/$init$();
      ()
    };
    def printType(container: net.yefremov.sample.Container): Unit = {
      case <synthetic> val x1: net.yefremov.sample.Container = container;
      case6(){
        if (x1.ne(null))
          {
            val stringValue: Object = x1.value();
            if (stringValue.$isInstanceOf[String]())
              {
                <synthetic> val x2: String = (stringValue.$asInstanceOf[String](): String);
                matchEnd5({
                  scala.this.Predef.println(new StringContext(scala.this.Predef.wrapRefArray(Array[String]{"It is a string: ", ""}.$asInstanceOf[Array[Object]]())).s(scala.this.Predef.genericWrapArray(Array[Object]{x2})));
                  scala.runtime.BoxedUnit.UNIT
                })
              }
            else
              case7()
          }
        else
          case7()
      };
      case7(){
        if (x1.ne(null))
          {
            val intValue: Object = x1.value();
            if (intValue.$isInstanceOf[Int]())
              {
                <synthetic> val x3: Int = (scala.Int.unbox(intValue): Int);
                matchEnd5({
                  scala.this.Predef.println(new StringContext(scala.this.Predef.wrapRefArray(Array[String]{"It is an int: ", ""}.$asInstanceOf[Array[Object]]())).s(scala.this.Predef.genericWrapArray(Array[Object]{scala.Int.box(x3)})));
                  scala.runtime.BoxedUnit.UNIT
                })
              }
            else
              case8()
          }
        else
          case8()
      };
      case8(){
        matchEnd5(throw new MatchError(x1))
      };
      matchEnd5(x: runtime.BoxedUnit){
        ()
      }
    };
    MatchingTest.this.printType(new net.yefremov.sample.Container("Heya!"));
    MatchingTest.this.printType(new net.yefremov.sample.Container(scala.Int.box(42)))
  }
}

There is a lot of stuff going on behind the scene! A couple of things related to pattern matching to learn from this output:

  1. There is an unapply method generated for cases classes. Nothing surprising here.

    scala def unapply(x$0: net.yefremov.sample.Container): Option = if (x$0.==(null)) scala.this.None else new Some(x$0.value());

  2. The generated method is not used for pattern matching. The compiler generates a more optimal code for case classes pattern matching instead.

I hope that will be helpful for someone exploring Scala magic.

Sharing reverse router across multiple Play projects

I was working on a couple of fairly complicated Play application recently. In order to keep complexity at an acceptably low level we broke the apps into sub-modules. Every sub-module is a Play application on it’s own and can be worked on, launched and tested separately. Then there is an aggregating app that takes care of making everyone play together. More about composing Play apps out of individual modules here and here.

This is how the project structure may look like:

main-app
  └ app
    └ controllers
      └ MainController.scala
  └ conf
    └ application.conf
    └ routes
module-foo
  └ app
    └ controllers
      └ FooController.scala    
  └ conf
    └ foo.routes    
module-bar
  └ app
    └ controllers
      └ BarController.scala    
  └ conf
    └ bar.routes            
project
  └ Build.scala 
 

There is one issue with this approach. Individual modules are fully isolated (and that’s what we are after), but sometimes they need to generate links to each other. Routes files are also split per module so there is no way to access other module’s reverse router. There is an issue opened for that quite some time ago. But at the time of writing it is not resolved yet.

When I started to search for potential solutions I found just one suggested by @godenji. The solution introduces it’s own parser for Play routes files and also a code generator to create reverse router objects. I was concerned about having a custom parser so decided to try to find another solution myself.

Here is what I came up with. It’s not perfect, but may work well in some cases and only uses Play’s parsers/generators.

The idea is to extract all routing into a separate library project that is shared across all modules. It only contains routes files and have no dependencies (but every module depends on it). There is a few tricks to make it working.

First, the routes compiler needs to have access to the controllers in order to validate the routes and generate a router. Just create traits for controllers that need to be linked from other modules and put them in the routes project.

Here is how the project looks now.

main-app
  └ app
    └ controllers
      └ MainController.scala
  └ conf
    └ application.conf
    └ routes
module-foo
  └ app
    └ controllers
      └ FooController.scala    
module-bar
  └ app
    └ controllers
      └ BarController.scala       
routes
  └ app
    └ controllers
      └ FooControllerApi.scala
      └ BarControllerApi.scala             
  └ conf
    └ foo.routes    
    └ bar.routes    
project
  └ Build.scala 
 

Second, you cannot use traits in a routes file, only concrete objects. This can be solved by using Play’s managed controllers. The feature was developed to support dependency injection in Play, but works well for us.

In order to use a trait in a routes file add @ to the route.

GET       /foo/hello        @FooControllerApi.helloFoo(name)
GET       /bar/hello        @BarControllerApi.helloBar(name)

When the compiler sees a route like that it will add a call to play.api.GlobalSettings#getControllerInstance to get an instance implementing the trait to delegate the call to. So we need to implement this call.

object Global extends GlobalSettings {

  private val controllerMapping = Map[Class[_], Controller](
    classOf[FooControllerApi] -> FooController,
    classOf[BarControllerApi] -> BarController
  )

  override def getControllerInstance[T](controllerClass: Class[T]): T = {
    controllerMapping(controllerClass).asInstanceOf[T]
  }

}

This is pretty much it. You can find a sample application here.

Again, this approach is not perfect. It requires some overhead: a separate project for routes, traits for controller interfaces and mapping of trait classes to corresponding instance objects. The latter can be solved by a classpath scanner that would find the implementations at start up time and generate the map automatically.

Please let me know what do you think.

Retrieving Scala enumeration constants by name

In many cases you may need to get an enumeration constant by it’s name. It is very easy to do it if you extend the standard Scala Enumeration class to create your enumeration. Below is a typical enumeration implementation:

object FunninessLevel extends Enumeration {
  type FunninessLevel = Value
  val LOL, ROFL, LMAO = Value
}

To retrieve a constant by it’s name you simply call withName on the object:

val level = FunninessLevel.withName("LOL")

So far so good?

It gets more difficult if you need to retrieve a constant for any given enumeration type at run time (e.g. JSON serialization). Java has a very convenient way of doing it: Enum.valueOf. But I could find nothing like that for Scala. So I ended up building my own helpers.

import scala.reflect.runtime.universe._

/**
 * Scala [[Enumeration]] helpers implementing Scala versions of
 * Java's [[java.lang.Enum.valueOf(Class[Enum], String)]].
 * @author Dmitriy Yefremov
 */
object EnumReflector {

  private val mirror: Mirror = runtimeMirror(getClass.getClassLoader)

  /**
   * Returns a value of the specified enumeration with the given name.
   * @param name value name
   * @tparam T enumeration type
   * @return enumeration value, see [[scala.Enumeration.withName(String)]]
   */
  def withName[T <: Enumeration#Value: TypeTag](name: String): T = {
    typeOf[T] match {
      case valueType @ TypeRef(enumType, _, _) =>
        val methodSymbol = factoryMethodSymbol(enumType)
        val moduleSymbol = enumType.termSymbol.asModule
        reflect(moduleSymbol, methodSymbol)(name).asInstanceOf[T]
    }
  }

  /**
   * Returns a value of the specified enumeration with the given name.
   * @param clazz enumeration class
   * @param name value name
   * @return enumeration value, see [[scala.Enumeration#withName(String)]]
   */
  def withName(clazz: Class[_], name: String): Enumeration#Value = {
    val classSymbol = mirror.classSymbol(clazz)
    val methodSymbol = factoryMethodSymbol(classSymbol.toType)
    val moduleSymbol = classSymbol.companionSymbol.asModule
    reflect(moduleSymbol, methodSymbol)(name).asInstanceOf[Enumeration#Value]
  }

  private def factoryMethodSymbol(enumType: Type): MethodSymbol = {
    enumType.member(newTermName("withName")).asMethod
  }

  private def reflect(module: ModuleSymbol, method: MethodSymbol)(args: Any*): Any = {
    val moduleMirror = mirror.reflectModule(module)
    val instanceMirror = mirror.reflect(moduleMirror.instance)
    instanceMirror.reflectMethod(method)(args:_*)
  }

}

Here is a client code example using TypeTag:

val level = EnumReflector.withName[FunninessLevel.Value]("LOL")

And another example with a class instance:

val level = EnumReflector.withName(FunninessLevel.getClass, "ROFL")

I’m quite happy with the TypeTag based implementation. For the class based implementation I would prefer to use classOf[FunninessLevel.Value], but it seems it is impossible. There is no specific value class created for different enumeration types. So classOf[FunninessLevel.Value] returns a reference to the base Enumeration#Value class and there is no way to get a reference to the actual enumeration object.

Do you have any suggestion on how to improve this code?

Blog like a hacker

Few days ago I came across an article describing how to create a blog using GitHub pages and Jekyll. The idea of creating a blog post by committing it into your Git repo looked kind of fun to me, so I decided to give it a try. This post is the place where I will keep the journal of the experiment.

10/28/2014:

My first blog post and the first issues. Default markdown parser does not understand GitHub style fenced blocks for code snippets. Fixed the issue using the Liquid tag for highligting {% highlight %}. This is not an ideal fix because that breaks GitHub’s own Markdown preview. Will try to find a better solution.

10/29/2014:

Jekyll automatically generates posts excerts. Want to display full posts on the main page. Found a way to controll excerts generation here.

10/30/2014:

Switched to Redcarpet for Markdown processing. This way I can get GitHub style fenced blocks with Pygments highlighting for code snippets.

01/02/2015:

Reinitialized the repo to remove Jekyll Now commits history.

04/16/2016:

Switched back to kramdown as a part of GitHub’s Jekyll 3.0 upgrade. New syntax highlights CSS from mojombo.