• balsoft@lemmy.ml
    link
    fedilink
    arrow-up
    101
    ·
    4 days ago

    You gotta admit though, Haskell is crazy good for parsing and marshaling data

    • marcos@lemmy.world
      link
      fedilink
      arrow-up
      42
      ·
      4 days ago

      Yes. I’m divided into “hum… 100 lines is larger than I expected” and “what did he mean ‘from scratch’? did he write the parser combinators? if so, 100 lines is crazy small!”

      But I’m settling in believing 80 of those lines are verbose type declarations.

      • balsoft@lemmy.ml
        link
        fedilink
        arrow-up
        16
        ·
        edit-2
        2 days ago

        I decided to write it myself for fun. I decided that “From Scratch” means:

        • No parser libraries (parsec/happy/etc)
        • No using read from Prelude
        • No hacky meta-parsing

        Here is what I came up with (using my favourite parsing method: parser combinators):

        import Control.Monad ((>=>), replicateM)
        import Control.Applicative (Alternative (..), asum, optional)
        import Data.Maybe (fromMaybe)
        import Data.Functor (($>))
        import Data.List (singleton)
        import Data.Map (Map, fromList)
        import Data.Bifunctor (first, second)
        import Data.Char (toLower, chr)
        
        newtype Parser i o = Parser { parse :: i -> Maybe (i, o) } deriving (Functor)
        
        instance Applicative (Parser i) where
          pure a = Parser $ \i -> Just (i, a)
          a <*> b = Parser $ parse a >=> \(i, f) -> second f <$> parse b i
        instance Alternative (Parser i) where
          empty = Parser $ const Nothing
          a <|> b = Parser $ \i -> parse a i <|> parse b i
        instance Monad (Parser i) where
          a >>= f = Parser $ parse a >=> \(i, b) -> parse (f b) i
        instance Semigroup o => Semigroup (Parser i o) where
          a <> b = (<>) <$> a <*> b
        instance Monoid o => Monoid (Parser i o) where
          mempty = pure mempty
        
        type SParser = Parser String
        
        charIf :: (a -> Bool) -> Parser [a] a
        charIf cond = Parser $ \i -> case i of
          (x:xs) | cond x -> Just (xs, x)
          _ -> Nothing
        
        char :: Eq a => a -> Parser [a] a
        char c = charIf (== c)
        
        one :: Parser i a -> Parser i [a]
        one = fmap singleton
        
        str :: Eq a => [a] -> Parser [a] [a]
        str = mapM char
        
        sepBy :: Parser i a -> Parser i b -> Parser i [a]
        sepBy a b = (one a <> many (b *> a)) <|> mempty
        
        data Decimal = Decimal { mantissa :: Integer, exponent :: Int } deriving Show
        
        data JSON = Object (Map String JSON) | Array [JSON] | Bool Bool | Number Decimal | String String | Null deriving Show
        
        whitespace :: SParser String
        whitespace = many $ asum $ map char [' ', '\t', '\r', '\n']
        
        digit :: Int -> SParser Int
        digit base = asum $ take base [asum [char c, char (toLower c)] $> n | (c, n) <- zip (['0'..'9'] <> ['A'..'Z']) [0..]]
        
        collectDigits :: Int -> [Int] -> Integer
        collectDigits base = foldl (\acc x -> acc * fromIntegral base + fromIntegral x) 0
        
        unsignedInteger :: SParser Integer
        unsignedInteger = collectDigits 10 <$> some (digit 10)
        
        integer :: SParser Integer
        integer = asum [char '-' $> (-1), char '+' $> 1, str "" $> 1] >>= \sign -> (sign *) <$> unsignedInteger
        
        -- This is the ceil of the log10 and also very inefficient
        log10 :: Integer -> Int
        log10 n
          | n < 1 = 0
          | otherwise = 1 + log10 (n `div` 10)
        
        jsonNumber :: SParser Decimal
        jsonNumber = do
          whole <- integer
          fraction <- fromMaybe 0 <$> optional (str "." *> unsignedInteger)
          e <- fromIntegral . fromMaybe 0 <$> optional ((str "E" <|> str "e") *> integer)
          pure $ Decimal (whole * 10^log10 fraction + signum whole * fraction) (e - log10 fraction)
        
        escapeChar :: SParser Char
        escapeChar = char '\\'
          *> asum [
            str "'" $> '\'',
            str "\"" $> '"',
            str "\\" $> '\\',
            str "n" $> '\n',
            str "r" $> '\r',
            str "t" $> '\t',
            str "b" $> '\b',
            str "f" $> '\f',
            str "u" *> (chr . fromIntegral . collectDigits 16 <$> replicateM 4 (digit 16))
          ]
        
        jsonString :: SParser String
        jsonString =
          char '"'
          *> many (asum [charIf (\c -> c /= '"' && c /= '\\'), escapeChar])
          <* char '"'
        
        jsonObjectPair :: SParser (String, JSON)
        jsonObjectPair = (,) <$> (whitespace *> jsonString <* whitespace <* char ':') <*> json
        
        json :: SParser JSON
        json =
          whitespace *>
            asum [
              Object <$> fromList <$> (char '{' *> jsonObjectPair `sepBy` char ',' <* char '}'),
              Array <$> (char '[' *> json `sepBy` char ',' <* char ']'),
              Bool <$> asum [str "true" $> True, str "false" $> False],
              Number <$> jsonNumber,
              String <$> jsonString,
              Null <$ str "null"
            ]
            <* whitespace
        
        main :: IO ()
        main = interact $ show . parse json
        
        

        This parses numbers as my own weird Decimal type, in order to preserve all information (converting to Double is lossy). I didn’t bother implementing any methods on the Decimal, because there are other libraries that do that and we’re just writing a parser.

        It’s also slow as hell but hey, that’s naive implementations for you!

        It ended up being 113 lines. I think I could reduce it a bit more if I was willing to sacrifice readability and/or just inline things instead of implementing stdlib typeclasses.

      • balsoft@lemmy.ml
        link
        fedilink
        arrow-up
        22
        ·
        4 days ago

        You could probably write a very basic parser combinator library, enough to parse JSON, in 100 lines of Haskell

      • join@lemmy.ml
        link
        fedilink
        arrow-up
        9
        ·
        4 days ago

        With recursive list comprehensions you can cram quite some complexity into one line of code.

      • expr@programming.dev
        link
        fedilink
        arrow-up
        3
        ·
        3 days ago

        Just looking at the image, yeah he’s a little parser combinator library entirely from scratch.

        Not sure what you mean by verbose type declarations. It looks to be 2 type declarations in a few lines of code (a newtype for the parser and a sum type to represent the different types of JSON values). It’s really not much at all.

        • jenesaisquoi@feddit.org
          link
          fedilink
          English
          arrow-up
          4
          ·
          3 days ago

          I will concede that implementing the first version in Haskell would be better.

          Mostly so that we can then fulfil the meme of reimplementing it Rust!

      • balsoft@lemmy.ml
        link
        fedilink
        arrow-up
        2
        ·
        3 days ago

        Personally I’m more partial to nom. Serde is quite verbose and complex for a parser.