(home)

⚛️
Pan* - or the universal converter pattern


  • Background
  • Practical motivation
    • When I was writing a lot of Haskell I regularly found myself wanting to generate Elm, PureScript, or TypeScript types from my Haskell types.
    • Two things could cause this:
      • I had a Haskell server with a client in another language
      • I had a Haskell reference implemention of an algorithm, toy language, etc. and wanted to start on followup implementations in other languages
    • Conceptually this is straightforward. Everyday Haskell types tend to be a subset of what Elm/PureScript/TypeScript support (going the other way is harder since Haskell doesn't have row polymorphism).
    • And there a lot of Haskell-types-to-X projects so I could usually find one that worked. But each time I found myself digging through the internet trying to find one that was the right fit for my current project I found myself thinking, "This situation is lot like Pandoc's for documents. Why not Pantype, Pandoc for types?"
      • Pantype would reduce work for users, since you'd only have to learn one API.
      • Pantype would reduce work for implementers, since the amount of conversions goes from quadratic to linear.
        • A second question: why not Pandoc for everything?
  • Pandoc details
    • Pandoc has a single core type that everything converts to or from: hackage.haskell.org/package/pandoc-types-1.23.0.1/docs/Text-Pandoc-Definition.html#t:Pandoc
    • The input and output formats supported are listed on the homepage: pandoc.org
      • Most are bidirectional. Some like CSV can only be input. Others like AsciiDoc and PowerPoint can only be output.
    • On lossiness: pandoc.org/MANUAL.html
      • Pandoc has a modular design: it consists of a set of readers, which parse text in a given format and produce a native representation of the document (an abstract syntax tree or AST), and a set of writers, which convert this native representation into a target format. Thus, adding an input or output format requires only adding a reader or writer. Users can also run custom pandoc filters to modify the intermediate AST.

        Because pandoc’s intermediate representation of a document is less expressive than many of the formats it converts between, one should not expect perfect conversions between every format and every other. Pandoc attempts to preserve the structural elements of a document, but not formatting details such as margin size. And some document elements, such as complex tables, may not fit into pandoc’s simple document model. While conversions from pandoc’s Markdown to all formats aspire to be perfect, conversions from formats more expressive than pandoc’s Markdown can be expected to be lossy.

    • These seem like pattens that could be useful in other circumstances, not just for documents.
  • Comparison with other design patterns
    • frame your program as a compiler (upcoming)
      • Similar to Pandoc-for-X, but more constrained. Compilers don't throw away inconvenient expressions when moving from a particular syntax to an IR, or from IR to machine code.
    • language-server-protocol for X
      • Also a successful example of going from having to write a quadratic number of programs to a linear one, but less useful to communicate intent since this is for protocols not data.
  • Note on the page name
    • I considered the following distinction
      • Pan* for converters that can be lossy both when generating IR and when generating output
      • universal converter for any type of multiway converter
    • However it seems like a universal converter necessitates possible lossiness at each step, so I think they're the same.
      • Technically it doesn't at the Input -> IR step though, you'd just end up with a gargantuan IR. So maybe this distinction was good?
  • Other universal converter projects
  • question: is there a better name for this? And are there any other particularly interesting universal converter projects I should know about?

Backlinks