Thursday, May 3, 2007

Parsing JSON

Here is the JSON parser I referred to in the previous post. It's not heavily documented because it's pretty close to a 1:1 translation of the specification.

There are some rough edges in this parser. The test suite includes the number 23456789012E666, which is out of range of IEEE doubles, and is read in as Infinity. While this value can be read in as something meaningful, it cannot be emitted, since there is no provision in JSON to express values like Infinity, -Infinity or NaN. The pretty-printer does not re-encode strings containing Unicode characters or escaped characters into a JSON-readable format. Finally, malformed inputs cause exceptions (error).

module JSON where

import Data.Char
import Data.Map hiding (map)
import Text.ParserCombinators.Parsec hiding (token)

--------------------------------------------------------------------------

data JsonValue = JsonString String
| JsonNumber Double
| JsonObject (Map String JsonValue)
| JsonArray [JsonValue]
| JsonTrue
| JsonFalse
| JsonNull
deriving (Show, Eq)

--------------------------------------------------------------------------
-- Convenient parse combinators

token :: Parser a -> Parser a
token p = do r <- p
spaces
return r

comma :: Parser Char
comma = token (char ',')

--------------------------------------------------------------------------

parseJSON :: String -> JsonValue
parseJSON str = case (parse jsonFile "" str) of
Left s -> error (show s)
Right v -> v

jsonFile :: Parser JsonValue
jsonFile = do contents <- jsonObject <|> jsonArray
eof
return contents

-- JSON Object
jsonObject :: Parser JsonValue
jsonObject = do pairs <- between open close (sepBy jsonPair comma)
return $ JsonObject $ fromList pairs
where
open = token (char '{')
close = token (char '}')

jsonPair :: Parser (String, JsonValue)
jsonPair = do key <- token(jsonString)
token (char ':')
value <- token(jsonValue)
return (toString key, value)
where
toString (JsonString s) = s
toString _ = ""

-- JSON Array
jsonArray :: Parser JsonValue
jsonArray = do values <- between open close (sepBy (token jsonValue) comma)
return $ JsonArray values
where
open = token (char '[')
close = token (char ']')


-- Any JSON Value
jsonValue :: Parser JsonValue
jsonValue = do spaces
obj <- token(jsonString
<|> jsonNumber
<|> jsonObject
<|> jsonArray
<|> jsonTrue
<|> jsonFalse
<|> jsonNull
)
return obj

-- JSON String
jsonString :: Parser JsonValue
jsonString = do s <- between (char '"' ) (char '"' ) (many jsonChar)
return (JsonString s)

isValidJsonChar ch = (isAscii ch) && (isPrint ch) && (ch /= '\\') && (ch /= '"')

hexToInt s = foldl (\i j -> (16 * i) + j) 0 (map digitToInt s)

jsonChar = satisfy isValidJsonChar
<|> do char '\\' -- escaping backslash
char '\\' -- escaped character
<|> char '"'
<|> char '/'
<|> (char 'b' >> return '\b')
<|> (char 'f' >> return '\f')
<|> (char 'n' >> return '\n')
<|> (char 'r' >> return '\r')
<|> (char 't' >> return '\t')
<|> do char 'u'
hex <- count 4 (satisfy isHexDigit)
return $ chr (hexToInt hex)

-- JSON Number
jsonNumber :: Parser JsonValue
jsonNumber = do i <- int
frac <- option "" frac
e <- option "" expo
return $ JsonNumber (read (i ++ frac ++ e))

int :: Parser String
int = do sign <- option "" (string "-")
value <- (string "0" <|> many1 digit)
return (sign ++ value)

frac :: Parser String
frac = do char '.'
digits <- many1 digit
return ( '.':digits)

expo :: Parser String
expo = do e <- oneOf "eE"
p <- option '+' (oneOf "+-")
n <- many1 digit
return (e : p : n)


-- JSON Constants
jsonTrue = token (string "true") >> return JsonTrue
jsonFalse = token (string "false") >> return JsonFalse
jsonNull = token (string "null") >> return JsonNull

--------------------------------------------------------------------------
-- A JSON Pretty Printer
--------------------------------------------------------------------------
pprint v = toString "" v

toString indent (JsonString s) = show s
toString indent (JsonNumber d) = show d
toString indent (JsonObject o) =
if (o == empty)
then "{}"
else "{\n" ++ showObjs (indent ++ " ") (toList o) ++ "\n" ++ indent ++ "}"
toString indent (JsonArray []) = "[]"
toString indent (JsonArray a) = "[\n" ++ showArray (indent ++ " ") a ++ "\n" ++ indent ++ "]"
toString indent (JsonTrue) = "true"
toString indent (JsonFalse) = "false"
toString indent (JsonNull) = "null"

showKeyValue i k v = i ++ show k ++ ": " ++ toString i v

showObjs i [] = ""
showObjs i [(k ,v)] = showKeyValue i k v
showObjs i ((k, v):t) = showKeyValue i k v ++ ",\n" ++ showObjs i t

showArray i [] = ""
showArray i [a] = i ++ toString i a
showArray i (h:t) = i ++ toString i h ++ ",\n" ++ showArray i t

--------------------------------------------------------------------------

2 comments:

Stéphane Bortzmeyer said...

It does not seem to parse JSON, as defined in RFC 4627. The JSON standard says that:

A string is a sequence of zero or more Unicode characters [UNICODE].

and:

Any character may be escaped.

Do note the "may". I believe (and this is something I discussed with
D. Crockford) that raw UTF-8 is legal in a JSON file, for instance:

{
"Boisson": "Café",
...

and it seems your parser cannot parse it (the jsonChar parser only
accepts ASCII).

(Yes, I know that parsing UTF-8 in Haskell is not as simple as it should be.)

Dimitre Novatchev said...

Maybe it would be interesting for you and your readers to look at the JSON parser I wrote in XSLT 2.0 using the FXSL Library. Now it is possible to transform JSON directly in XSLT.

The functions implementing this are:

f:json-document() and
f:json-file-document()

I would appreciate any comments/shared experience on using these two functions.

Dimitre Novatchev