Archive

Archive for April, 2010

JSON Schema: specifying and validating JSON data structures

April 23, 2010 18 comments

Introduction

From my own experience, I can mainly see 2 major reasons why someone would need a JSON Schema language:

  1. Specify JSON data structures: this is particularly useful when exposing JSON based web services to a wide audience and documenting them.
  2. Validate JSON data structures.

Recently, I started designing / writing JSON based web services on the Java platform for the Europass project. The purpose of these services was to allow external web applications to easily integrate the Europass CV online generation services to their web applications. I was really surprised to see that there was no standard way to describe the structure of the JSON objects expected and as consequence no standard way to validate the incoming objects.

Some googling around and I rapidly landed on the draft JSON Schema specification:

This draft specification helped me a lot for my project as it allowed me to write my schemas in a mature and ready-to-use syntax … instead of inventing one of my own. However, I rapidly realised that there was no Java implementation of this specification. Even the very good Jackson JSON Processor project that I decided to use for the processing of the JSON streams had “only” implemented the generation of a JSON Schema starting from a JSON Object.

This is the reason why I decided to write an implementation of my own and share it with all the people potentially interested in it. Moreover, I decided to write it at home, in my free time in order to be able to distribute it under an open license … but because of this, I cannot guarantee that the development will go very fast 😉

The JSON Schema specification

Now, let’s go back to the specification itself. It has several parts but the first one I am going to work on will be the “core” specification corresponding to the paragraph 5 of the text. The first thing to tell is that JSON Schema is to JSON what XSD is to XML:

  • JSON Schema is self descriptive and you can write a JSON Schema describing the syntax of JSON Schema.
  • JSON Schemas are written in JSON

The writer of the specification and owner of the corresponding Google Group (Kris Zyp) has written an implementation of the “core” specification in Javascript: it is a very good starting point for anyone willing to implement the specification or simply test it. I have written a simple HTML page allowing to test a JSON instance against a JSON schema using this Javascript implementation. I have used it myself to write this post and to start the Java implementation of the specification.

In the following paragraphs, I briefly present the core JSON Schema language through some examples. My objective is to show that it is quite easy to write and expressive enough for most cases.

The basic types

The core specification defines the 8 following type: Object, Array, String, Number, Integer, Boolean, Null and Any. Two of them are contained types (Object and Array), five are atomic types and one (Any) is very convenient 🙂 its existence is also tightly related to the dynamic nature of the Javascript language. By the way, the JSON home and a good source for a lot of resource and information on JSON is: http://www.json.org/.

In order to specify the “type” of a JSON object as described by the specficiation, you should write a JSON object with a property named “type” containing one of the following 8 strings: object, array, string, number, integer, boolean, null or any.

In this post, I use 2 examples on the 2 container type which help me cover a lot of the simple types too.

Specifying an object

First the schema …
{
  "description" : "Example Address JSON Schema",
  "type" : "object",
  "properties" : {
    "address" : {
      "title": "Street name and number",
      "type" : "string"
    },
    "city" : {
      "title" : "City name",
      "type" : "string"
    },
    "postalCode" : {
      "title" : "Zip Code: 2 letters dash five digits",
      "type" : "string",
      "pattern" : "^[A-Z]{2}-[0-9]{5}$"
    },
    "region" : {
      "title" : "Optional Region name",
      "type" : "string",
      "optional" : true
    },
    "country" : {
      "title" : "Country name",
      "type" : "string"
    }
  },
  "additionalProperties" : false
}
… and some explanations

As you can see specifying an “object” in JSON Schema is as simple as having the 2 following properties in a JSON object:

  • a “type” property with a string value set to “object”
  • a “properties” property containing an object, whose properties are named after the properties of the object described and contain a JSON Schema describing them.

You may have notice the “additionalProperties” property, which is set to false. This property is used to specify whether additional properties are allowed or not. In the former case, the “additionalProperties” property must contain a JSON Schema and in the later,  it must be set to false. In our Address JSON schema, we do not allow any other properties than those described in the schema.

Note that the “title” and “description” properties are for general usage (and optional). They allow to document the JSON Schema.

If we dive a little bit deeply into the example schema, we can see that each property of the Address schema (address, city, postalCode, region, country) is in turn a JSON Schema with a “type” property taking one of the 8 types allowed by the core specification and some more properties allowing to define in greater detail the usage of the property.

Some properties you can find in our Address schema are:

  • the “optional” property, which allows to specify whether a property is required or not for the object to be valid. In our example, an Address object is valid even if it does not contains information on the region
  • the “pattern” property, which allows to set a regular expression on string properties. In our example, a valid postal code is composed of 2 capital letters followed by a dash and 4 digits.
Second the instance …

Follows a JSON Address instance complying with this JSON Schema:

{
  "address" : "Μέγαλου Σπηλαίου 4",
  "city" : "Athens",
  "postalCode" : "GR-15125",
  "country" : "Greece"
}

Specifying an array

First the schema …
{
  "description" : "Example Contact Information Array JSON Schema",
  "type" : "array",
  "items" : {
    "title" : "A Contact Information object",
    "type" : "object",
    "properties" : {
      "name" : {
        "type" : "string",
        "enum" : ["home", "work", "other"]
      },
      "phone" : {
        "type" : "string",
        "optional" : true,
        "format" : "phone"
      },
      "mobile" : {
        "type" : "string",
        "optional" : true,
        "format" : "phone"
      },
      "email" : {
        "type" : "string",
        "optional" : true,
        "format" : "email"
      }
    },
    "minItems" : 1,
    "maxItems" : 5
  }
}
… and some explanations

As you can see specifying an “array” in JSON Schema is as simple as having the 2 following properties in a JSON object:

  • a “type” property with a string value set to “array”
  • an “items” property containing a JSON Schema allowing to validate each element of the array. Please note, that the “items” property may contain an array of JSON Schemas in order to validate each element of the array against a different schema: this is called tuple-validation.

You may have notice that the JSON Schema specifying the “items” of the array contains some properties specific to arrays:

  • a “minItems” property specifying the minimum number of elements the array should contain in order to be valid. In our example, a Contact Information array should contain at least 1 contact object in order to be valid.
  • a “maxItems” property specifying the maximum number of elements the array can contain in order to be valid. In our example, a Contact Information array can contain up to 5 contacts objects in order to be valid.

Our Contact Information Array Schema, is an array of objects. Each object is a Contact Information Object composed of 4 properties: “name”, “phone”, “mobile” and “email”. There are some things that are worth noting in the JSON Schemas defining each of these properties:

  • the “enum” property allows to specify a closed list of allowed values for a property. In our example, a Contact Information Object can be a “home”, “work” or “other” type of contact.
  • the “format” property allows to specify a valid format on properties using a predefined (and extensible) set of supported formats. In our example, “phone” and “mobile” properties have a “phone” format, where the “email” property has an “email” format. Please note that it is said in the specification that implementions are not obliged to support all the formats listed in the specification.
Second the instance …
[
  { "name" : "home", "phone" : "+302109349764", "email": "nico@vahlas.eu" },
  { "name" : "work", "phone" : "+302108029409", "email": "nvah@instore.gr" }
]

Implementing the JSON Schema specification in Java

As I mentioned in the introduction of this post, I have decided to write a Java based implementation of the specification allowing to validate JSON strings against JSON Schemas. When I started my initial intention was to “simply” port the Javascript implementation of the “core” specification … but when I dived into the code my opinion changed and I decided to write something more “Java-like” if I may say so.

Infrastructure

Roughly, the infrastructure I use for this little project is the following:

  • Jackson as the JSON processor library: I have used it for the Europass project recently and liked it; it’s the more complete Java-based JSON processor I have found so far
  • GIT as the code versioning system: I love the idea of distributed versioning systems and I wanted to try Git for a long time.
  • Gitorious to host the project: JSON Schema Validation in Java, the public code repository, the wiki and project page.
  • Maven and Eclipse for the development tools and infrastructure with the EGit plugin

Status

Roughly, the status / progress of my work at the time of this writing is the following:

  • I have started the implementation but I am not far enough to make it publicly available
  • I have started a discussion on the JSON Schema Google Group and have had feedback from the lead developer of the Jackson project: seems there is a lot of interest in the thing
  • I have created a project and a repository on Gitorious
  • I am writing this post 🙂

It’s a lot of work to do all this setup … I had not realised that it would take me so much time. I hope it will be worth it. Voilà !

Advertisements