Home > Computer Sciences > JSON Schema: specifying and validating JSON data structures

JSON Schema: specifying and validating JSON data structures

Introduction

From my own experience, I can mainly see 2 major reasons why someone would need a JSON Schema language:

  1. Specify JSON data structures: this is particularly useful when exposing JSON based web services to a wide audience and documenting them.
  2. Validate JSON data structures.

Recently, I started designing / writing JSON based web services on the Java platform for the Europass project. The purpose of these services was to allow external web applications to easily integrate the Europass CV online generation services to their web applications. I was really surprised to see that there was no standard way to describe the structure of the JSON objects expected and as consequence no standard way to validate the incoming objects.

Some googling around and I rapidly landed on the draft JSON Schema specification:

This draft specification helped me a lot for my project as it allowed me to write my schemas in a mature and ready-to-use syntax … instead of inventing one of my own. However, I rapidly realised that there was no Java implementation of this specification. Even the very good Jackson JSON Processor project that I decided to use for the processing of the JSON streams had “only” implemented the generation of a JSON Schema starting from a JSON Object.

This is the reason why I decided to write an implementation of my own and share it with all the people potentially interested in it. Moreover, I decided to write it at home, in my free time in order to be able to distribute it under an open license … but because of this, I cannot guarantee that the development will go very fast ;-)

The JSON Schema specification

Now, let’s go back to the specification itself. It has several parts but the first one I am going to work on will be the “core” specification corresponding to the paragraph 5 of the text. The first thing to tell is that JSON Schema is to JSON what XSD is to XML:

  • JSON Schema is self descriptive and you can write a JSON Schema describing the syntax of JSON Schema.
  • JSON Schemas are written in JSON

The writer of the specification and owner of the corresponding Google Group (Kris Zyp) has written an implementation of the “core” specification in Javascript: it is a very good starting point for anyone willing to implement the specification or simply test it. I have written a simple HTML page allowing to test a JSON instance against a JSON schema using this Javascript implementation. I have used it myself to write this post and to start the Java implementation of the specification.

In the following paragraphs, I briefly present the core JSON Schema language through some examples. My objective is to show that it is quite easy to write and expressive enough for most cases.

The basic types

The core specification defines the 8 following type: Object, Array, String, Number, Integer, Boolean, Null and Any. Two of them are contained types (Object and Array), five are atomic types and one (Any) is very convenient :-) its existence is also tightly related to the dynamic nature of the Javascript language. By the way, the JSON home and a good source for a lot of resource and information on JSON is: http://www.json.org/.

In order to specify the “type” of a JSON object as described by the specficiation, you should write a JSON object with a property named “type” containing one of the following 8 strings: object, array, string, number, integer, boolean, null or any.

In this post, I use 2 examples on the 2 container type which help me cover a lot of the simple types too.

Specifying an object

First the schema …
{
  "description" : "Example Address JSON Schema",
  "type" : "object",
  "properties" : {
    "address" : {
      "title": "Street name and number",
      "type" : "string"
    },
    "city" : {
      "title" : "City name",
      "type" : "string"
    },
    "postalCode" : {
      "title" : "Zip Code: 2 letters dash five digits",
      "type" : "string",
      "pattern" : "^[A-Z]{2}-[0-9]{5}$"
    },
    "region" : {
      "title" : "Optional Region name",
      "type" : "string",
      "optional" : true
    },
    "country" : {
      "title" : "Country name",
      "type" : "string"
    }
  },
  "additionalProperties" : false
}
… and some explanations

As you can see specifying an “object” in JSON Schema is as simple as having the 2 following properties in a JSON object:

  • a “type” property with a string value set to “object”
  • a “properties” property containing an object, whose properties are named after the properties of the object described and contain a JSON Schema describing them.

You may have notice the “additionalProperties” property, which is set to false. This property is used to specify whether additional properties are allowed or not. In the former case, the “additionalProperties” property must contain a JSON Schema and in the later,  it must be set to false. In our Address JSON schema, we do not allow any other properties than those described in the schema.

Note that the “title” and “description” properties are for general usage (and optional). They allow to document the JSON Schema.

If we dive a little bit deeply into the example schema, we can see that each property of the Address schema (address, city, postalCode, region, country) is in turn a JSON Schema with a “type” property taking one of the 8 types allowed by the core specification and some more properties allowing to define in greater detail the usage of the property.

Some properties you can find in our Address schema are:

  • the “optional” property, which allows to specify whether a property is required or not for the object to be valid. In our example, an Address object is valid even if it does not contains information on the region
  • the “pattern” property, which allows to set a regular expression on string properties. In our example, a valid postal code is composed of 2 capital letters followed by a dash and 4 digits.
Second the instance …

Follows a JSON Address instance complying with this JSON Schema:

{
  "address" : "Μέγαλου Σπηλαίου 4",
  "city" : "Athens",
  "postalCode" : "GR-15125",
  "country" : "Greece"
}

Specifying an array

First the schema …
{
  "description" : "Example Contact Information Array JSON Schema",
  "type" : "array",
  "items" : {
    "title" : "A Contact Information object",
    "type" : "object",
    "properties" : {
      "name" : {
        "type" : "string",
        "enum" : ["home", "work", "other"]
      },
      "phone" : {
        "type" : "string",
        "optional" : true,
        "format" : "phone"
      },
      "mobile" : {
        "type" : "string",
        "optional" : true,
        "format" : "phone"
      },
      "email" : {
        "type" : "string",
        "optional" : true,
        "format" : "email"
      }
    },
    "minItems" : 1,
    "maxItems" : 5
  }
}
… and some explanations

As you can see specifying an “array” in JSON Schema is as simple as having the 2 following properties in a JSON object:

  • a “type” property with a string value set to “array”
  • an “items” property containing a JSON Schema allowing to validate each element of the array. Please note, that the “items” property may contain an array of JSON Schemas in order to validate each element of the array against a different schema: this is called tuple-validation.

You may have notice that the JSON Schema specifying the “items” of the array contains some properties specific to arrays:

  • a “minItems” property specifying the minimum number of elements the array should contain in order to be valid. In our example, a Contact Information array should contain at least 1 contact object in order to be valid.
  • a “maxItems” property specifying the maximum number of elements the array can contain in order to be valid. In our example, a Contact Information array can contain up to 5 contacts objects in order to be valid.

Our Contact Information Array Schema, is an array of objects. Each object is a Contact Information Object composed of 4 properties: “name”, “phone”, “mobile” and “email”. There are some things that are worth noting in the JSON Schemas defining each of these properties:

  • the “enum” property allows to specify a closed list of allowed values for a property. In our example, a Contact Information Object can be a “home”, “work” or “other” type of contact.
  • the “format” property allows to specify a valid format on properties using a predefined (and extensible) set of supported formats. In our example, “phone” and “mobile” properties have a “phone” format, where the “email” property has an “email” format. Please note that it is said in the specification that implementions are not obliged to support all the formats listed in the specification.
Second the instance …
[
  { "name" : "home", "phone" : "+302109349764", "email": "nico@vahlas.eu" },
  { "name" : "work", "phone" : "+302108029409", "email": "nvah@instore.gr" }
]

Implementing the JSON Schema specification in Java

As I mentioned in the introduction of this post, I have decided to write a Java based implementation of the specification allowing to validate JSON strings against JSON Schemas. When I started my initial intention was to “simply” port the Javascript implementation of the “core” specification … but when I dived into the code my opinion changed and I decided to write something more “Java-like” if I may say so.

Infrastructure

Roughly, the infrastructure I use for this little project is the following:

  • Jackson as the JSON processor library: I have used it for the Europass project recently and liked it; it’s the more complete Java-based JSON processor I have found so far
  • GIT as the code versioning system: I love the idea of distributed versioning systems and I wanted to try Git for a long time.
  • Gitorious to host the project: JSON Schema Validation in Java, the public code repository, the wiki and project page.
  • Maven and Eclipse for the development tools and infrastructure with the EGit plugin

Status

Roughly, the status / progress of my work at the time of this writing is the following:

  • I have started the implementation but I am not far enough to make it publicly available
  • I have started a discussion on the JSON Schema Google Group and have had feedback from the lead developer of the Jackson project: seems there is a lot of interest in the thing
  • I have created a project and a repository on Gitorious
  • I am writing this post :-)

It’s a lot of work to do all this setup … I had not realised that it would take me so much time. I hope it will be worth it. Voilà !

About these ads
  1. pmik
    May 1, 2010 at 12:08 | #1

    I am impressed although I did not understand half of it!
    But what do I know… I am an Oracle guy!!!

  2. May 3, 2010 at 09:40 | #2

    @pmik
    Thanks for the comment! My explanations must be very poor if you don’t understand half of what I write … anyway, I’ll keep trying.

    @all
    I will publish the code early this week on gitorious: I am almost done with most of the functionality I wanted to implement.

  3. Andres
    May 11, 2010 at 18:30 | #3

    Hi Nicolas,

    It’s great you are doing this. I work in a project that really need this functionality ¿Are you planning to integrate your code with Jackson? Please let me know if you need some help.

    Cheers.

    • May 12, 2010 at 08:58 | #4

      Hi Andres,

      thanks for your feedback.

      I am planning to release a first snapshot this weekend: I am a bit late with my initial planning … don’t have a lot of free time. The implementation will be based on Jackson but not really integrated to it: I didn’t want to “loose” too much time diving into the Jackson source code at this stage. However, any help will be welcome once I release the code :)

      I will post on my blog as soon as I publish the code on gitorious.

  4. May 17, 2010 at 23:19 | #7

    I fixed the link leading to my “simple HTML” test page … someone should have told me that it was pointing to the wrong google document :(

    Anyway, it’s fixed now if anybody wants to play with it.

  5. Andres
    June 2, 2010 at 01:50 | #8

    I’ve checked out the code. It’s really neat, congratulations. I’ve tested it as well and I’m getting good results. I think I’m going to write some code in order to ignore “description” and I’ll try to create a validator for “optional”. I’ll let you know when I’m done.

    Thank you very much for your code, it’s really good stuff.

    Andrés.

    • June 2, 2010 at 08:43 | #9

      Cool :-)

      However, the “optional” property is already supported by the PropertiesValidator … you should not have to do something more, except if it is buggy of course ;)

      I am not sure that I understand what you mean with “ignore” the “description” property? Can you elaborate ?

      Nico

  6. Andres
    June 2, 2010 at 16:09 | #10

    For example in this schema definition: http://json-schema.org/card

    Definitions like:
    “description”:”A representation of a person, company, organization, or place”
    and
    “description”:”Formatted Name”

    will generate:
    java.lang.ClassNotFoundException: eu.vahlas.json.schema.impl.validators.DescriptionValidator

    and definitions like:
    “optional”:true

    will produce:
    java.lang.ClassNotFoundException: eu.vahlas.json.schema.impl.validators.OptionalValidator

    I’m not testing it with that schema exactly, but with a similar one. Anyway I’ll recheck the compliance of my schema with the json schema standard.

    Thanks again!
    Andrés.

    • June 3, 2010 at 00:34 | #11

      I have checked these things:

      1)The exceptions that you see are just logged for information: I have changed their log level to WARN and added a NoOpValidator by default when the implementation class is not found.

      2) Concerning the “title” and “description” properties, nothing is expected to be done by the validator, these elements are only there for documentation purposes. So it is only natural to have a NoOpValidator at this level.

      3) Concerning the “optional” property, there was a little bug when it was set to false: I fixed it.

      4) Concerning the “card” schema, I have written a test class … now everything works fine :) I have even included an experimental implementation for references ($ref) that exist in the card schema on “adr” and “geo” properties.

      I have pushed all this to gitorious so that you can check it out :)

      By the way, if anyone has examples of JSON Schemas and instances available: I would be very happy to use them to test my code.

  7. Andres
    June 7, 2010 at 22:15 | #12

    Nico,

    Thank you very much for making these improvements, I’ve pulled (is this the word you use when using GIT?) the code and I’m going to start testing it. If you need some help please let me know.

    Thanks again,
    Andrés.

  8. Al
    August 3, 2010 at 18:45 | #13

    Hi,
    I’m trying to validate a simple schema using your code and I get the following error. Should I just ignore this error?
    ERROR eu.vahlas.json.schema.impl.JacksonSchema – Could not load validator id
    java.lang.ClassNotFoundException: eu.vahlas.json.schema.impl.validators.IdValidator

    I couldn’t find IdValidator class anywhere in your package library.

    Thanks
    Al

    • August 23, 2010 at 13:51 | #14

      Hi,

      sorry for the delay of the response but I was on Holidays :)

      If you still have the problem, could you please post your JSON Schema so that I can have a look at it … it seems that you have an unexpected “id” property in your schema.

      However, the validator should not throw an error in the case of an “unknown” property in the schema. Do you use the “latest” version from the git repository ?

      Hope this helps!

  9. October 27, 2010 at 11:31 | #15

    hi, I’d very much like to see a Europass CV in JSON format. any chance the web services are up and running yet ? or maybe a javascript or java class that does a good job of translating the xml into JSON ? thank you very much Luca

  10. October 27, 2010 at 12:42 | #16

    hi, any chance of that JSON webservice is up and running ? i’m trying to convert europass xml into json. many thanks Luca

    • November 1, 2010 at 21:52 | #17

      hi, some work is being done on the Europass project to provide JSON based web services. I also know that a Europass JSON Schema has been implemented. Unfortunately I am not in charge of this project anymore as I have left the company responsible for its implementation and I don’t know when they plan to release all this work.

      I will contact them and try to get some information as soon as possible. I’ll post it as soon as I get it.

      You can also try to contact them via the “Your Opinion” page of the Europass Web Site. They are very interested in collaborations and will certainly answer to you, particularly if you tell them that you have been in contact with me ;).

      May I ask in what context you use Europass XML ?
      Nico

  11. Suresh
    February 21, 2011 at 14:42 | #18

    hi,
    Here is my requirement,
    1) Validate my client JSON request at the server side.
    2) Convert the JSON request to an Java object.

    What validation method is better in terms of performace,
    1) Define json schema and validate the json request against the schema, or
    2) First convert json request to java object and have a validate method in the java class and validate it.

    Will approach 2 have better performance? Have any one encountered this requirement?

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: