{"id":14,"date":"2008-08-28T21:33:19","date_gmt":"2008-08-28T19:33:19","guid":{"rendered":"http:\/\/floris.briolas.nl\/floris\/?p=14"},"modified":"2009-10-14T20:06:42","modified_gmt":"2009-10-14T19:06:42","slug":"antlr-ast-and-rewriting-rules","status":"publish","type":"post","link":"https:\/\/floris.briolas.nl\/floris\/2008\/08\/antlr-ast-and-rewriting-rules\/","title":{"rendered":"Antlr, AST and rewriting rules."},"content":{"rendered":"<p>Onward with my little project, with writing a SVG renderer. It&#8217;s intended as a study to get to know ANTLR and SVG.<\/p>\n<p>The good folks of<a href=\"http:\/\/www.w3.org\/TR\/SVG11\/paths.html\"> W3.org<\/a> are so kind to publish a <a href=\"http:\/\/en.wikipedia.org\/wiki\/Backus%E2%80%93Naur_Form\">BNF<\/a> that can be rewritten to an AntLr Grammar.<\/p>\n<p>The goal of this post is to interpret:<\/p>\n<blockquote><p>M115 285 C115 400 285 400 285 285 C400 285 400 115 285 115 C285 0 115 0 115 115 C0 115 0 285 115 285 z<\/p><\/blockquote>\n<p>Into a drawing of some sort.<\/p>\n<p>After reading the BNF it&#8217;s clear that the letters stand for commands, and are usually followed by one or more coordinates. EG. &#8216;M&#8217; is the move to command, and &#8216;C&#8217; is a curve command, and so on.<\/p>\n<p>The Grammar for SVG will be posted soon in another post. This post is about rewriting.<\/p>\n<div id=\"attachment_16\" style=\"width: 160px\" class=\"wp-caption alignleft\"><a href=\"http:\/\/floris.briolas.nl\/floris\/wp-content\/uploads\/2008\/08\/ast-tree-beforerewrite.jpg\"><img aria-describedby=\"caption-attachment-16\" decoding=\"async\" loading=\"lazy\" class=\"size-thumbnail wp-image-16\" title=\"AST tree SVG Path\" src=\"http:\/\/floris.briolas.nl\/floris\/wp-content\/uploads\/2008\/08\/ast-tree-beforerewrite-150x88.jpg\" alt=\"AST tree SVG Path\" width=\"150\" height=\"88\" \/><\/a><p id=\"caption-attachment-16\" class=\"wp-caption-text\">AST tree SVG Path<\/p><\/div>\n<p>When using the grammar, that I need to post, the AST output looks like this. My first thought was &#8220;so I&#8217;ve done all this an this is what I get!?&#8221; <a href=\"http:\/\/www.amazon.com\/gp\/offer-listing\/0978739256?ie=UTF8&amp;tag=acodernamedfl-20&amp;linkCode=am2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0978739256\">The Definitive ANTLR Reference: Building Domain-Specific Languages (Pragmatic Programmers)<\/a><img decoding=\"async\" loading=\"lazy\" style=\"border:none !important; margin:0px !important;\" src=\"http:\/\/www.assoc-amazon.com\/e\/ir?t=acodernamedfl-20&amp;l=am2&amp;o=1&amp;a=0978739256\" border=\"0\" alt=\"\" width=\"1\" height=\"1\" \/> showed me I can rewrite and restructure the tree.<\/p>\n<p><strong>Learn by a Simpler Example:<\/strong><\/p>\n<p>Let say we want to parse a C# function header in an interface, which kinda looks like:<\/p>\n<blockquote><p>public void foo(string prm1, out int val1);<\/p><\/blockquote>\n<p>the ANTLR grammar would look something like:<\/p>\n<pre name=\"code\" class=\"antlr\">\r\n\r\ngrammar functionheader; \r\n\r\nfunction_header\r\n\t: access_modifier? return_type function_name LPAREN argument_list? RPAREN SEMICOLON;\r\n\r\n argument_list\r\n\t:\targument (COMMA  argument_list)?\r\n\t;  \r\n\r\nargument : argument_accessor? type argument_name\r\n\t; \r\n\r\nargument_accessor\r\n\t:\t'ref'|'out';\r\n\r\naccess_modifier\r\n\t:\t 'public'|'private'|'protected'|'internal'; \r\n\r\nreturn_type\r\n\t:\ttype|'void';\r\n\r\ntype\t:\t'string'|'int'|'float'|'double'|'long'; \r\n\r\nargument_name\r\n\t:\tNAME; \r\n\r\nfunction_name\r\n\t:\tNAME;\r\n\r\n\/\/PARSER\r\n\r\nCOMMA\t:\t',';\r\n\r\nSEMICOLON\r\n\t:\t';'; \r\n\r\nNAME\t:\tCHAR(DIGIT|CHAR)+; \r\n\r\nCHAR \t:\t('a'..'z'|'A'..'Z'|'_');\r\n\r\nDIGIT\r\n\t:\t('0'..'9'); \r\n\r\nLPAREN\t:\t'(';\r\n\r\nRPAREN\t:\t')';\r\n\r\nWHITESPACE\r\n\t: ( '\\t' | ' ' | '\\r' | '\\n'| '\\u000C') \t{ $channel = HIDDEN; } ;\r\n<\/pre>\n<p><strong>Rewrite Rules<\/strong>, <span style=\"text-decoration: underline;\">Omitting items<\/span><\/p>\n<pre name=\"code\" class=\"antlr\">\r\nfunction_header : access_modifier? return_type function_name LPAREN argument_list? RPAREN SEMICOLON;\r\n<\/pre>\n<p>the parser rule above contains some parts that are not really interesting at all. The LPAREN RPAREN and SEMICOLON are not really interesting at all. They are just there to help separate the different aspects from the function header. So when ANTLR has done it&#8217;s job there is no need for them any more and we can leave them out with rewriting rules. Thus the rewrite statement looks like:<\/p>\n<pre name=\"code\" class=\"antlr\">\r\nfunction_header :\r\naccess_modifier? return_type function_name LPAREN argument_list? RPAREN SEMICOLON\r\n-&gt;access_modifier? return_type function_name argument_list?;\r\n<\/pre>\n<p>Resulting in:<\/p>\n<div id=\"attachment_26\" style=\"width: 160px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/floris.briolas.nl\/floris\/wp-content\/uploads\/2008\/08\/ast-tree-function-beforerewrite.jpg\"><img aria-describedby=\"caption-attachment-26\" decoding=\"async\" loading=\"lazy\" class=\"size-thumbnail wp-image-26\" title=\"ast-tree-function-beforerewrite\" src=\"http:\/\/floris.briolas.nl\/floris\/wp-content\/uploads\/2008\/08\/ast-tree-function-beforerewrite-150x94.jpg\" alt=\"ast tree function before rewrite\" width=\"150\" height=\"94\" \/><\/a><p id=\"caption-attachment-26\" class=\"wp-caption-text\">ast tree function before rewrite<\/p><\/div>\n<div id=\"attachment_27\" style=\"width: 160px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/floris.briolas.nl\/floris\/wp-content\/uploads\/2008\/08\/ast-tree-function-afterrewrite.jpg\"><img aria-describedby=\"caption-attachment-27\" decoding=\"async\" loading=\"lazy\" class=\"size-thumbnail wp-image-27\" title=\"ast-tree-function-afterrewrite\" src=\"http:\/\/floris.briolas.nl\/floris\/wp-content\/uploads\/2008\/08\/ast-tree-function-afterrewrite-150x94.jpg\" alt=\"ast tree function after rewrite\" width=\"150\" height=\"94\" \/><\/a><p id=\"caption-attachment-27\" class=\"wp-caption-text\">ast tree function after rewrite<\/p><\/div>\n<p>You can clearly see the symbols being omitted.<\/p>\n<p><strong>Rewrite Rules<\/strong>, <span style=\"text-decoration: underline;\">Creating children<\/span><\/p>\n<p>Next big rewrite option is to structure the tree in a tree you are comfortable with to walk\/interpret in your own code. The symbol that is used is the &#8216;^&#8217; sign. Also used in to describe the POW function.<\/p>\n<p>There are two way&#8217;s of doing it. This first one is to put it in your parser rule like:<\/p>\n<pre name=\"code\" class=\"antlr\">\r\nfunction_header\r\n: access_modifier? return_type function_name^ LPAREN argument_list? RPAREN SEMICOLON;\r\n<\/pre>\n<div id=\"attachment_40\" style=\"width: 310px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/floris.briolas.nl\/floris\/wp-content\/uploads\/2008\/08\/ast-tree-function-root1.jpg\"><img aria-describedby=\"caption-attachment-40\" decoding=\"async\" loading=\"lazy\" class=\"size-medium wp-image-40\" title=\"ast-tree-function-root1\" src=\"http:\/\/floris.briolas.nl\/floris\/wp-content\/uploads\/2008\/08\/ast-tree-function-root1-300x41.jpg\" alt=\"Result of rewrite rule\" width=\"300\" height=\"41\" srcset=\"https:\/\/floris.briolas.nl\/floris\/wp-content\/uploads\/2008\/08\/ast-tree-function-root1-300x41.jpg 300w, https:\/\/floris.briolas.nl\/floris\/wp-content\/uploads\/2008\/08\/ast-tree-function-root1.jpg 683w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><p id=\"caption-attachment-40\" class=\"wp-caption-text\">Result of rewrite rule<\/p><\/div>\n<p>making the function_name the root^\u00a0 (Do not use rewrites ANTLR will throw errors in your face).<\/p>\n<p>The second method is to use it in your rewrite rule (my personal favorite) :<\/p>\n<pre name=\"code\" class=\"antlr\">\r\nfunction_header\r\n: access_modifier? return_type function_name LPAREN argument_list? RPAREN SEMICOLON\r\n -&gt; ^(function_name access_modifier? return_type argument_list? )\r\n<\/pre>\n<div id=\"attachment_41\" style=\"width: 310px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/floris.briolas.nl\/floris\/wp-content\/uploads\/2008\/08\/ast-tree-function-root2.jpg\"><img aria-describedby=\"caption-attachment-41\" decoding=\"async\" loading=\"lazy\" class=\"size-medium wp-image-41\" title=\"ast-tree-function-root2\" src=\"http:\/\/floris.briolas.nl\/floris\/wp-content\/uploads\/2008\/08\/ast-tree-function-root2-300x77.jpg\" alt=\"rewrite rule\" width=\"300\" height=\"77\" srcset=\"https:\/\/floris.briolas.nl\/floris\/wp-content\/uploads\/2008\/08\/ast-tree-function-root2-300x77.jpg 300w, https:\/\/floris.briolas.nl\/floris\/wp-content\/uploads\/2008\/08\/ast-tree-function-root2.jpg 544w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><p id=\"caption-attachment-41\" class=\"wp-caption-text\">rewrite rule<\/p><\/div>\n<p>so that is &#8220;^(root child child &#8230; child)&#8221;, you may also write &#8220;^(root child..child ^(subroot child..Child) child..child)&#8221;. etc..<\/p>\n<p><strong>Rewrite Rules<\/strong>, <span style=\"text-decoration: underline;\">Adding some extra nodes<\/span><\/p>\n<p>To make the AST tree just right, so your code can handle your AST much better you may need to insert some tokens. You have seen some rewriting rules, with those you can insert some &#8216;dummy&#8217; nodes. Just tell ANTLR you need some extra tokens like this:<\/p>\n<pre name=\"code\" class=\"antlr\">\r\ntokens\r\n{\r\nFUNCTION;\r\nNAME;\r\nMODIFIER;\r\nRETURNTYPE;\r\nPARAMS;\r\nTYPE;\r\nACCESSER;\r\n}\r\nfunction_header\r\n: access_modifier? return_type function_name LPAREN argument_list? RPAREN SEMICOLUMN\r\n-&gt; ^(FUNCTION ^(NAME function_name) ^(MODIFIER access_modifier)? ^(RETURNTYPE return_type) ^(PARAMS argument_list)? )\r\n;<\/pre>\n<p>note there is a difference in opt1=&#8221;^(PARAMS argument_list)?&#8221; and opt2=&#8221;^(PARAMS argument_list?)&#8221;<\/p>\n<p>opt1 will ommit the PARAMS token from the tree when arument_list is empty. While opt2 will show a PARAMS token, but it won&#8217;t have children when the agument_list is empty.<\/p>\n<p>here is a final listing and a sample output of the AST:<\/p>\n<div id=\"attachment_39\" style=\"width: 310px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/floris.briolas.nl\/floris\/wp-content\/uploads\/2008\/08\/ast-tree-function-root3.jpg\"><img aria-describedby=\"caption-attachment-39\" decoding=\"async\" loading=\"lazy\" class=\"size-medium wp-image-39\" title=\"ast-tree-function-root3\" src=\"http:\/\/floris.briolas.nl\/floris\/wp-content\/uploads\/2008\/08\/ast-tree-function-root3-300x97.jpg\" alt=\"The AST tree fully manipulated.\" width=\"300\" height=\"97\" srcset=\"https:\/\/floris.briolas.nl\/floris\/wp-content\/uploads\/2008\/08\/ast-tree-function-root3-300x97.jpg 300w, https:\/\/floris.briolas.nl\/floris\/wp-content\/uploads\/2008\/08\/ast-tree-function-root3.jpg 727w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><p id=\"caption-attachment-39\" class=\"wp-caption-text\">The AST tree fully manipulated.<\/p><\/div>\n<pre name=\"code\" class=\"antlr\">\r\ngrammar functionheader;\r\noptions\r\n{\r\n    output=AST;\r\n}\r\ntokens\r\n{\r\n\tFUNCTION;\r\n\tNAME;\r\n\tMODIFIER;\r\n\tRETURNTYPE;\r\n\tPARAMS;\r\n\tTYPE;\r\n\tACCESSER;\r\n}\r\nfunction_header\r\n\t: access_modifier? return_type function_name LPAREN argument_list? RPAREN SEMICOLUMN\r\n\t-&gt; ^(FUNCTION ^(NAME function_name) ^(MODIFIER access_modifier)? ^(RETURNTYPE return_type) ^(PARAMS argument_list)? )\r\n\t;\r\nargument_list\r\n\t:\targument (COMMA  argument_list)?\r\n\t-&gt;\targument argument_list?\r\n\t; \r\nargument : argument_accesser? type argument_name\r\n\t-&gt; ^(argument_name (ACCESSER argument_accesser)? TYPE type )\r\n\t;\r\nargument_accesser\r\n\t:\t'ref'|'out';\r\naccess_modifier\r\n\t:\t 'public'|'private'|'protected'|'internal';\r\nreturn_type\r\n\t:\ttype|'void';\r\ntype\t:\t'string'|'int'|'float'|'double'|'long';\r\nargument_name\r\n\t:\tNAME;\r\nfunction_name\r\n\t:\tNAME;\r\n\/\/PARSER\r\nCOMMA\t:\t',';\r\nSEMICOLUMN\r\n\t:\t';';\r\nNAME\t:\tCHAR(DIGIT|CHAR)+;\r\nCHAR \t:\t('a'..'z'|'A'..'Z'|'_');\r\nDIGIT\r\n\t:\t('0'..'9');\r\nLPAREN\t:\t'(';\r\nRPAREN\t:\t')';\r\nWHITESPACE\r\n\t: ( '\\t' | ' ' | '\\r' | '\\n'| '\\u000C') \t{ $channel = HIDDEN; } ;\r\n<\/pre>\n<p><a href=\"downloads\/functionheader.g\">this file<\/a> contains the grammar.<\/p>\n<p><strong>Please, if you have questions or comments, let us know.<\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Onward with my little project, with writing a SVG renderer. It&#8217;s intended as a study to get to know ANTLR and SVG. The good folks of W3.org are so kind to publish a BNF that can be rewritten to an AntLr Grammar. The goal of this post is to interpret: M115 285 C115 400 285 [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_newsletter_tier_id":0,"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false}}},"categories":[3],"tags":[6,9],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p61yPs-e","_links":{"self":[{"href":"https:\/\/floris.briolas.nl\/floris\/wp-json\/wp\/v2\/posts\/14"}],"collection":[{"href":"https:\/\/floris.briolas.nl\/floris\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/floris.briolas.nl\/floris\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/floris.briolas.nl\/floris\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/floris.briolas.nl\/floris\/wp-json\/wp\/v2\/comments?post=14"}],"version-history":[{"count":35,"href":"https:\/\/floris.briolas.nl\/floris\/wp-json\/wp\/v2\/posts\/14\/revisions"}],"predecessor-version":[{"id":49,"href":"https:\/\/floris.briolas.nl\/floris\/wp-json\/wp\/v2\/posts\/14\/revisions\/49"}],"wp:attachment":[{"href":"https:\/\/floris.briolas.nl\/floris\/wp-json\/wp\/v2\/media?parent=14"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/floris.briolas.nl\/floris\/wp-json\/wp\/v2\/categories?post=14"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/floris.briolas.nl\/floris\/wp-json\/wp\/v2\/tags?post=14"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}