Archive

Posts Tagged ‘json’

JSON Schema: first Java implementation available!

May 17, 2010 19 comments

Java source code available on gitorious

Yesterday night, I published a first version of the source code on Gitorious. It is released under the Apache V2.0 License.

This implementation covers nearly all of the “Core Schema Definition” corresponding to the paragraph 5 of the specification.  The “missing” items (mainly, 5.21, 5.22 and 5.25) concern points of the specifications that need to be clarified in order to be implemented.

Concerning the implementation itself, the main design ideas are the following:

  • Each validator should be a small stateless and easy to test object, implementing one and only one of the rules of the specification.
  • A schema object should be a “validating engine”,  containing a graph of validator objects built on construction.
  • Once loaded a schema object should be reusable in order to validate as many JSON instances as needed.

As you may already have guessed, a “wide” set of JUnit test cases is provided with the source code. Each test case, allows to test one of the “validators” separately using very simple JSON schemas and instances. There is also a more “complicated” and “complete” test case allowing to test combinations of validators.

Finally, some more work has to be done on the Java Documentation … I will cope with it during the following days and push it to the central repository.

Usage

The following few lines of code, show you how you can use the implementation in order to validate an JSON instance against a JSON schema:

		// Jackson parsing API: the ObjectMapper can be provided
		// and configured differently depending on the application
		ObjectMapper mapper = new ObjectMapper();

		// Allows to retrieve a JSONSchema object on various sources
		// supported by the ObjectMapper provided
		JSONSchemaProvider schemaProvider = new JacksonSchemaProvider(mapper);

		// Retrieves a JSON Schema object based on a file
		InputStream schemaIS = new FileInputStream("schema.json");
		JSONSchema schema = schemaProvider.getSchema(schemaIS);

		// Validates a JSON Instance object stored in a file
		InputStream instanceIS = new FileInputStream("instance1.json");
		List<String> errors = schema.validate(instanceIS);

		// Display the eventual errors
		for ( String s : errors ) {
			System.out.println(s);
		}

The project should be easy to build with Maven: a “pom.xml” file is provided with the source code. A simple “mvn package” should be enough to build the code, run the tests, produce the javadoc and the jar file.

I have also made some JAR’s available for those, who do not wish to build the JSON Schema validator from the source code:

  • The binary archive is available here
  • The javadoc archive is available here

Plans for the near future …

I will post on the Jackson project’s mailing lists in order to get some feedback from them: I would be very happy and proud to see this code tightly integrated inside the Jackson project!

I will also ask for the needed precisions concerning the “missing” points of this implementation to the people in charge of the specification: I would love to have 100% of the specification implemented. More generally, I have some questions concerning the possibility to reference / reuse existing JSON Schemas: the Core Schema Definition seems to allow only “anonymous” types. In a complex schema, the possibility to define and reuse “named” types (like in XML Schema) would be very handy.

At this very early stage, any help will be welcome: testing, using, fixing, extending … there is still some work to be done before the first release. I plan to use this implementation as it is on a project in the very near future … I will of course publish any fix, extension, documentation. For example, I will make a Google Guice module in the context of this project in order to avoid all the “boiler plate” instantiation code that you can see in my example (Google Guice is my preferred choice when it comes to DI 😉 ).

Implementation Matrix: Paragraph 5 – Core Schema Definition

$ Title Status
5.1 type simple: OK / union: OK
5.2 properties OK
5.3 items simple: OK / tuple: OK
5.4 optional OK
5.5 additionalProperties OK
5.6 requires name: OK / schema: OK
5.7 minimum OK
5.8 maximum OK
5.9 minimumCanEqual OK
5.10 maximumCanEqual OK
5.11 minItems OK
5.12 maxItems OK
5.13 uniqueItems OK
5.14 pattern OK
5.15 maxLength OK
5.16 minLength OK
5.17 enum OK
5.18 title NOTHING
5.19 description NOTHING
5.20 format TODO (OPTIONAL)
5.21 contentEncoding TODO
5.22 default TODO
5.23 divisibleBy OK
5.24 disallow OK
5.25 extends TODO
Advertisements

JSON Schema: specifying and validating JSON data structures

April 23, 2010 18 comments

Introduction

From my own experience, I can mainly see 2 major reasons why someone would need a JSON Schema language:

  1. Specify JSON data structures: this is particularly useful when exposing JSON based web services to a wide audience and documenting them.
  2. Validate JSON data structures.

Recently, I started designing / writing JSON based web services on the Java platform for the Europass project. The purpose of these services was to allow external web applications to easily integrate the Europass CV online generation services to their web applications. I was really surprised to see that there was no standard way to describe the structure of the JSON objects expected and as consequence no standard way to validate the incoming objects.

Some googling around and I rapidly landed on the draft JSON Schema specification:

This draft specification helped me a lot for my project as it allowed me to write my schemas in a mature and ready-to-use syntax … instead of inventing one of my own. However, I rapidly realised that there was no Java implementation of this specification. Even the very good Jackson JSON Processor project that I decided to use for the processing of the JSON streams had “only” implemented the generation of a JSON Schema starting from a JSON Object.

This is the reason why I decided to write an implementation of my own and share it with all the people potentially interested in it. Moreover, I decided to write it at home, in my free time in order to be able to distribute it under an open license … but because of this, I cannot guarantee that the development will go very fast 😉

The JSON Schema specification

Now, let’s go back to the specification itself. It has several parts but the first one I am going to work on will be the “core” specification corresponding to the paragraph 5 of the text. The first thing to tell is that JSON Schema is to JSON what XSD is to XML:

  • JSON Schema is self descriptive and you can write a JSON Schema describing the syntax of JSON Schema.
  • JSON Schemas are written in JSON

The writer of the specification and owner of the corresponding Google Group (Kris Zyp) has written an implementation of the “core” specification in Javascript: it is a very good starting point for anyone willing to implement the specification or simply test it. I have written a simple HTML page allowing to test a JSON instance against a JSON schema using this Javascript implementation. I have used it myself to write this post and to start the Java implementation of the specification.

In the following paragraphs, I briefly present the core JSON Schema language through some examples. My objective is to show that it is quite easy to write and expressive enough for most cases.

The basic types

The core specification defines the 8 following type: Object, Array, String, Number, Integer, Boolean, Null and Any. Two of them are contained types (Object and Array), five are atomic types and one (Any) is very convenient 🙂 its existence is also tightly related to the dynamic nature of the Javascript language. By the way, the JSON home and a good source for a lot of resource and information on JSON is: http://www.json.org/.

In order to specify the “type” of a JSON object as described by the specficiation, you should write a JSON object with a property named “type” containing one of the following 8 strings: object, array, string, number, integer, boolean, null or any.

In this post, I use 2 examples on the 2 container type which help me cover a lot of the simple types too.

Specifying an object

First the schema …
{
  "description" : "Example Address JSON Schema",
  "type" : "object",
  "properties" : {
    "address" : {
      "title": "Street name and number",
      "type" : "string"
    },
    "city" : {
      "title" : "City name",
      "type" : "string"
    },
    "postalCode" : {
      "title" : "Zip Code: 2 letters dash five digits",
      "type" : "string",
      "pattern" : "^[A-Z]{2}-[0-9]{5}$"
    },
    "region" : {
      "title" : "Optional Region name",
      "type" : "string",
      "optional" : true
    },
    "country" : {
      "title" : "Country name",
      "type" : "string"
    }
  },
  "additionalProperties" : false
}
… and some explanations

As you can see specifying an “object” in JSON Schema is as simple as having the 2 following properties in a JSON object:

  • a “type” property with a string value set to “object”
  • a “properties” property containing an object, whose properties are named after the properties of the object described and contain a JSON Schema describing them.

You may have notice the “additionalProperties” property, which is set to false. This property is used to specify whether additional properties are allowed or not. In the former case, the “additionalProperties” property must contain a JSON Schema and in the later,  it must be set to false. In our Address JSON schema, we do not allow any other properties than those described in the schema.

Note that the “title” and “description” properties are for general usage (and optional). They allow to document the JSON Schema.

If we dive a little bit deeply into the example schema, we can see that each property of the Address schema (address, city, postalCode, region, country) is in turn a JSON Schema with a “type” property taking one of the 8 types allowed by the core specification and some more properties allowing to define in greater detail the usage of the property.

Some properties you can find in our Address schema are:

  • the “optional” property, which allows to specify whether a property is required or not for the object to be valid. In our example, an Address object is valid even if it does not contains information on the region
  • the “pattern” property, which allows to set a regular expression on string properties. In our example, a valid postal code is composed of 2 capital letters followed by a dash and 4 digits.
Second the instance …

Follows a JSON Address instance complying with this JSON Schema:

{
  "address" : "Μέγαλου Σπηλαίου 4",
  "city" : "Athens",
  "postalCode" : "GR-15125",
  "country" : "Greece"
}

Specifying an array

First the schema …
{
  "description" : "Example Contact Information Array JSON Schema",
  "type" : "array",
  "items" : {
    "title" : "A Contact Information object",
    "type" : "object",
    "properties" : {
      "name" : {
        "type" : "string",
        "enum" : ["home", "work", "other"]
      },
      "phone" : {
        "type" : "string",
        "optional" : true,
        "format" : "phone"
      },
      "mobile" : {
        "type" : "string",
        "optional" : true,
        "format" : "phone"
      },
      "email" : {
        "type" : "string",
        "optional" : true,
        "format" : "email"
      }
    },
    "minItems" : 1,
    "maxItems" : 5
  }
}
… and some explanations

As you can see specifying an “array” in JSON Schema is as simple as having the 2 following properties in a JSON object:

  • a “type” property with a string value set to “array”
  • an “items” property containing a JSON Schema allowing to validate each element of the array. Please note, that the “items” property may contain an array of JSON Schemas in order to validate each element of the array against a different schema: this is called tuple-validation.

You may have notice that the JSON Schema specifying the “items” of the array contains some properties specific to arrays:

  • a “minItems” property specifying the minimum number of elements the array should contain in order to be valid. In our example, a Contact Information array should contain at least 1 contact object in order to be valid.
  • a “maxItems” property specifying the maximum number of elements the array can contain in order to be valid. In our example, a Contact Information array can contain up to 5 contacts objects in order to be valid.

Our Contact Information Array Schema, is an array of objects. Each object is a Contact Information Object composed of 4 properties: “name”, “phone”, “mobile” and “email”. There are some things that are worth noting in the JSON Schemas defining each of these properties:

  • the “enum” property allows to specify a closed list of allowed values for a property. In our example, a Contact Information Object can be a “home”, “work” or “other” type of contact.
  • the “format” property allows to specify a valid format on properties using a predefined (and extensible) set of supported formats. In our example, “phone” and “mobile” properties have a “phone” format, where the “email” property has an “email” format. Please note that it is said in the specification that implementions are not obliged to support all the formats listed in the specification.
Second the instance …
[
  { "name" : "home", "phone" : "+302109349764", "email": "nico@vahlas.eu" },
  { "name" : "work", "phone" : "+302108029409", "email": "nvah@instore.gr" }
]

Implementing the JSON Schema specification in Java

As I mentioned in the introduction of this post, I have decided to write a Java based implementation of the specification allowing to validate JSON strings against JSON Schemas. When I started my initial intention was to “simply” port the Javascript implementation of the “core” specification … but when I dived into the code my opinion changed and I decided to write something more “Java-like” if I may say so.

Infrastructure

Roughly, the infrastructure I use for this little project is the following:

  • Jackson as the JSON processor library: I have used it for the Europass project recently and liked it; it’s the more complete Java-based JSON processor I have found so far
  • GIT as the code versioning system: I love the idea of distributed versioning systems and I wanted to try Git for a long time.
  • Gitorious to host the project: JSON Schema Validation in Java, the public code repository, the wiki and project page.
  • Maven and Eclipse for the development tools and infrastructure with the EGit plugin

Status

Roughly, the status / progress of my work at the time of this writing is the following:

  • I have started the implementation but I am not far enough to make it publicly available
  • I have started a discussion on the JSON Schema Google Group and have had feedback from the lead developer of the Jackson project: seems there is a lot of interest in the thing
  • I have created a project and a repository on Gitorious
  • I am writing this post 🙂

It’s a lot of work to do all this setup … I had not realised that it would take me so much time. I hope it will be worth it. Voilà !