“Trece años dedicò a esas heterogéneas fatigas, pero la mano de un forastero lo asesinò y su novela era insensata y nadie encontrò el laberinto.”
J.L. Borges, El jardìn de senderos que se bifurcan.

Groucho

(A marΧup language)


Intro

Groucho is a janky little tool I wrote to generate HTML files from a text containing simplified and non-intrusive markup codes.

It was written in order to ease writing blog posts for my website. Its goal is to allow to write content along with formatting instructions, aiming at the least friction.

As the posts I'm writting on this website are mainly technical, Groucho is built around three main goals :

Some examples :

*this is some text*   will print in bold : this is some text.
/this is some text/   will print in italics : this is some text.
_this is some text_   will print underlined : this is some text.

[m]x_+ = \frac{{-b+\sqrt{b^2-4ac}}{2a}}[/m]   will print this well-known maths formula :

x +   =   − b +  b2 − 4ac 2a


Basic principles

If not otherwise mentioned, a character in the input will produce the same character on the output. Some special characters and sequences will be interpreted as symbols and insert html entities in the html output. Some others will be interpreted as markups and insert html tags in the output stream. These tags contain class attributes that are used by the browser, along with a css stylesheet, to style their content. A sample stylesheet is available with groucho, but it should be customizable at will (well, with css, you never know).

The symbols, the markups and their interpretations depends on the current mode of the interpreter which can be one of the four following modes :

By default, groucho will operate in text mode. You can switch to one of the other modes by using block tags, like [maths]...[/maths], and all text inserted between these two tags will be interpreted in this mode.



HTML mode

All text placed between the tags [html] and [/html] will be interpreted as HTML and will be outputed as-is.
To produce the basic tags needed for an HTML document, you should insert the following block at the beginning of your input file :

[html] <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" href="css/styles.css"/> </head> <body> [/html]

Where you can also specify your CSS style sheet. To close those tags you should insert the following block at the end of your input file :

[html] </body> </html> [/html]



Markups common to Text and Maths mode

Text is the default mode, which is used when the source content is not contained between a pair of matching block markups.

Maths mode can be used by enclosing text between the markup tags [maths] and [/maths], which will interpret its content as maths, and also place the output between <div class="maths"></div> html tags. Alternatively, if one whishes to insert maths into the same line as normal text, the markup tags [m] and [/m] can be used, which will place the output between <span class="maths"></span> html tags.

Text and maths mode share some of their markups, so we will present them together and then present their differences.

HTML entities

Some characters used by html tags will be replaced by html entities :

Escape sequences

If a character is preceded by a \, it will be interpreted as follows :

In addition to that, there's a list of letters or words that will be interpreted as a special symbol or markup when preceded by \ :




Text mode

Additional markups

In text mode, a section title can be created by enclosing text in a matching pair of = sequences. The section level corresponds to the number of = signs :

===== Exemple title ===== in the input will produce the html <h5>Exemple title</h5>, resulting in :

Exemple title

A separating line can be created by a sequence of five or more hyphens : ----- will output the html tag <hr/> resulting in :


Paragraphs can be created by simply having two line breaks in a row.

The use of \b, \i and \u markups can alternatively replaced by (respectively) *, /, and _.

The following character sequences can be used to create symbols :

Lists

A number of tabs followed by an hyphen, followed by a space, will be interpreted as a list item. The number of leading tabs determine the list depth when creating nested lists :

- item 1 - item 2 - sub item 2.1 - sub item 2.2 - item 3

will produce the following list :

External content

Links

he markup tags [url=][/url] can be used to create a link. You specifiy the URL of your link by adding it after the = sign. The text of the link goes between the two markup tags.
For instance, [url=https://www.forkingpaths.garden]Home page[/url] will create the following link : Home page that will land you on Forking Paths's website.


Images

n image can be created with the tag [img=]. Similarly to the URL tag, you specify the URL to you image after the = sign. Hence, the following tag [img=https://www.forkingpaths.garden/img/logo.png] will display Forking Paths' logo :



Maths mode

By default, operators and capital letters will be printed in roman with a math-specific font, while lower-case letters will be printed in italics. You can always reverse that by using \r and \i markups.

Default symbol interpretation

In maths mode, the following (non-escaped) sequences are recognized as symbol equivalents :

Exponents and indices

A character preceded by a _ will be printed as an index, as in uk. You can use a pair of curled brackets after the underscode to have a longer index, ie. this code : C_{(i,j)} will produce this result : C(i,j).

Similarly, the character ^ is used for exponent, as in ej(2πf t + φ).

Vectors

\vec : the character following this escape markup will be rendered with a vector arrow above it, as in v.

Square roots

\sqrt : a single character, or a sequence of characters contained in curly brackets, following this escape markup, will be printed as the arguments of a square root.

 a × (1 − n)2 

More precisely, this markup outputs the following HTML :

<span class="sqrt-symbol">&radic;</span> <span class="sqrt-arg">&thinsp;YourExpressionHere;&nbsp;</span>

The square root character is rendered as an HTML entity and the trailing top line of the square root is simulated by the top border of the sqrt-arg span. Because HTML/CSS layout is such a disaster, it is extremely likely that the square root and the top bar won't join and you will have to adjust the stylesheet to make it happen. There's not much we can do about it, until MathML is largely supported, except resorting to TeX generated images and the like...

Fraction

You can use the markup \frac{{numerator}{denominator}} to generate a fraction. For instance,

y = \frac{{x^2+1}{2}}

will generate the following equation :

y  =  x2 + 12

As stated above, depending on your font, you might want to adjust the text-alignment property of the fraction class to align the fraction line with the equal sign.

Sums and Products

 and  ct can be used to create indexed sums or products. You need to specify the maximum index and the minimum index enclosed in curled brackets, like this :

x = \nsum{{p}{k=1}}u_k

which produces this result :

x  =  pk = 1 uk


Code mode

All text placed between the tags [html] and [/html] will be interpreted as C code, syntax highlighted and placed between the html tags <div class="code"></div>. Alternatively, if one whishes to insert code into the same line as normal text, the markup tags [c] and [/c] can be used, which will place the output between the html tags <span class="code"></span>. Inline code won't be syntax highlighted though.

You can specify that you don't want the syntax highlighter to be run on a code block by using the opening [code=none] mark instead of [code]

The CSS styling of code section uses white-space: pre; to preserve indentation and line breaks. It also uses a monospace font by default. The syntax highlighter performs a simplified syntactical analysis of the code and encloses elements in <span></span> tags, with one of the following classes :

The syntax highlighter may fail on some cases, either because the syntax is in fact incorrect or because the code uses some unrecognized dialect, or because the parser stumbles upon a macro, or simply because it lacks context. As a matter of fact, C not being completely context free, and to avoid complicating the parser for such a simple task, the parser sometimes falls back to some (hopefully) reasonable assumption on what's going on. If it was a wrong guess, it will try to recover and continue to output non highlighted code until it can resynchronize, often after the next semicolon or at the begining of the next compound statement.

Here is an exemple of a code snippet :

for(int i=0;i<count;i++) { // Get all my items by name in the hash table and print their value int hash = HashFunction(names[i]); my_struct* a = GetItem(hash); printf("item %s = %i %s\n", names[i], a->value, a->unit); }


Planned features

This is a list of features that might be added along the way, as I need them and/or find time to implement them :