Lexer At work, a tricky sample

I was asked to help out with a grammar. It’s part of the “Package Manager Specification” see gentoo’s spec.

The spec allow to add dependencies, the syntax for this is : [operator][category]/[package name]-[version]

operator operator’s valid values are : ‘<‘ ‘<=’ ‘=’ ‘~’ ‘>=’ ‘>’
category 2.1.1 Category Names
A category name may contain any of the characters [A-Za-z0-9+_.-]. It must not begin with a hyphen or a
dot.
Note: A hyphen is not required because of the virtual category. Usually, however, category names will
contain a hyphen.
package name 2.1.2 Package Names
A package name may contain any of the characters [A-Za-z0-9+_-]. It must not begin with a hyphen, and
must not end in a hyphen followed by one or more digits.
version 2.2 Version Specifications
The package manager must not impose fixed limits upon the number of version components. Package managers
should indicate or reject any version that is invalid according to these rules.
A version starts with the number part, which is in the form [0-9]+(n.[0-9]+)* (a positive integer, followed
by zero or more dot-prefixed positive integers).
This may optionally be followed by one of [a-z] (a lowercase letter).

In the spec there are more requirements, but it will make this example so much more harder to understand.

now the trick is that package name (or pn for short), can not end in a hyphen followed by one or more digits’. Thus valids could be:

foo
foo-bla
foo1-bla-

but I’m not sure about the following.

foo-12a-12, if one follows the spec, it ends with -12.. so one could cut that of but you will still be left with -12a ,” a hyphen followed by one or more digits“. I would like to know it that is valid or not. Please tell me! I’m not sure.

I would like to know if this is valid: foo-123-foo? (does not end in.. I know)
I would like to know if this is valid:foo-12-12a

since category and pn and version are much alike and share much the same tokens it seemed logical to create parser rules instead of lexer rules.

dep    :    OPERATOR catagory SLASH pn HYPHEN version;

where OPERATOR is defined as:

OPERATOR:    ‘<‘|'<=’|’=’|’~’|’>=’|’>’;

and SLASH and HYPHEN :

HYPHEN    :    ‘-‘;
SLASH    :    ‘/’;

now here comes the difficult stuff:

Both Category and Pn share a lot of the same tokens. Category may contain a ‘.’  and pn may not. So if we would lex :
‘foo-bar’
would that result in  a CATEGORY token or a PN token? The answer is that Antlr would not stand by this and tell you that “the following token definitions can never be matched because prior tokens match the same input”. If there is no dot in the characters being lexed it could well have been a pn. In short both rules are quite ambigous, but it get’s wores. How do we tell the lexer the differences between a version and the pn if only characters are used that are valid for both rules. The answer is we need to do this ourselfs. The lexer will happely continue eating valid chars and thus eating characters in the version portion. It would have been easier if the final hypen would have been a character not in the range of valid chars of eiter rule.

On the other hand CATEGORY always comes after OPERATOR and PN always after SLASH. We can uses this to our advantage by using a construct named ‘semantic predicates’. We are going to use this to influence the lexer.

When the lexer is done wit operator it will set a flag telling it to match the tokens as with the category rules, and after the slash the flag will be set again to match tokes that apply to the package name rules. Resulting in “OPERATOR:    (‘<‘|'<=’|’=’|’~’|’>=’|’>’ ){category=true;};” and “SLASH    :    ‘/’  {category=false;};”.

a boolean will be set to true or to false. Which has still to be defined at the lexer by setting (basic programming rule:”declare what you want to use”) :

@lexer::members
{
boolean category=false;
}

Now Category and pn share common tokens, these are defined as follows:
fragment GENERIC    :    (UPPER|LOWER|UNDERSCORE|DIGIT);

(i use fragment so GENERIC won’t be seen as a individual token, it can be used over and over again.)

Next we are putting both

STR:
GENERIC
(
{category}?=>(GENERIC|PLUS|DOT|HYPHEN)*
|
{!category}?=>(GENERIC|PLUS|HYPHEN)*
)

So if we now set dep to : “dep    :    OPERATOR category SLASH pn; // HYPHEN VERSION;” (skipping the version) and test it:

>=cat.a.gory/packag-eName

Input was “>=cat.a.gory/packag-eName” Input was fully taken correctly.

>=cat.a.gory/packag.eName

Input was “>=cat.a.gory/packag.eName” The lexer stopped at “packag” since ‘.’ was not a valid token for package name.

Now we can add version to it, thus changing dep to : “dep    :    OPERATOR category SLASH pn HYPHEN VERSION;” ,  and try “>=cat.a.gory/packageName-12a” for inpu, telling it’s version 12.

Version not matched

Whoa what just happend?!! package name included the version!

The cause to this is because the “-12a” part can be consumed by the package name rule. And the “can not end with hyphen followed by digit” rule was not implemented.

How to implement can not end? Well currently Antlr does not support such a construct as in may not end in (not to be confused by the ‘~’ operator, meaning match anything until …). I’m going to use the input.mark, input.rewind and input release commands.  When mark is called, you get an id. With this id you can rewind back to this mark. Any tokens matched in between will become consumable once again, it’s like reversing time. Input.Release(id) does not rewind, but cleans up the administration.

The goal is to match “packageName-12a”, with a marker @ the HYPEN. Then we check if the rule “can not end in hypen.. digit” is true or false. If true, we will simply rewind and the that’s that.  So the matched “packageName-12a” will end up in “packageName” (with “-12a” rewinded, left for other rules).

Every time a hyphen comes by i’ll mark it, if we already had a mark, release it.

This is the final result:

Comments welcome!

//options
//{
//language = CSharp;
//}

//http://www.gentoo.org/proj/en/qa/pms.xml

/*
2.1.1 Category Names
A category name may contain any of the characters [A-Za-z0-9+_.-]. It must not begin with a hyphen or a
dot.
Note: A hyphen is not required because of the virtual category. Usually, however, category names will
contain a hyphen.
*/

/*
2.1.2 Package Names
A package name may contain any of the characters [A-Za-z0-9+_-]. It must not begin with a hyphen, and
must not end in a hyphen followed by one or more digits.
*/

/*
2.2 Version Specifications
The package manager must not impose fixed limits upon the number of version components. Package managers
should indicate or reject any version that is invalid according to these rules.
A version starts with the number part, which is in the form [0-9]+(n.[0-9]+)* (a positive integer, followed
by zero or more dot-prefixed positive integers).
This may optionally be followed by one of [a-z] (a lowercase letter).
*/

@lexer::members
{
boolean category=false;

//c#
//bool category=false;

}

dep    :    OPERATOR category SLASH pn HYPHEN VERSION;

HYPHEN    :    ‘-‘;
SLASH    :    ‘/’  {category=false;};
VERSION    :    DIGIT+ (DOT DIGIT+)* LOWER?;
OPERATOR:    (‘<‘|'<=’|’=’|’~’|’>=’|’>’) {category=true;};

category:    STR;
pn    :    STR;

STR
@init
{
int _mrk=-1;
//    int start = CharIndex; //c#
int start = getCharIndex(); //JAVA
}
:
//c#    {int _mrk=-1;int start = getCharIndex;}

GENERIC
(
{category}?=>(GENERIC|PLUS|DOT|HYPHEN)*
|
{!category}?=>
(

(GENERIC|PLUS|( {if (_mrk!=-1){input.release(_mrk);} {_mrk = input.mark();} }  HYPHEN))*
)
)
/*
Post Check: If this is a Package name and it ends in eg. -12 just take that off.
*/
{
if (!category)
{

String s = input.substring(start, input.getCharPositionInLine()-1);
String[] s2 = s.split(“-“);
if ( Character.isDigit(s2[s2.length-1].charAt(0)) && s2.length>1)
{
input.rewind(_mrk); _mrk = -1; //rewind (altering this tokens, leave the rest to be matched.)
}
else
{
if (_mrk != -1) { input.release(_mrk); } //release, do not alter token.
}

}
}//end of post check.
;

fragment UNDERSCORE    :    ‘_’;
fragment UPPER    :    ‘A’..’Z’;
fragment DIGIT    :    ‘0’..’9′;
fragment LOWER     :    ‘a’..’z’;
fragment DOT    :    ‘.’;
fragment PLUS    :    ‘+’;
fragment GENERIC    :    (UPPER|LOWER|UNDERSCORE|DIGIT);

WS     : (‘ ‘|’t’ | ‘r’ | ‘n’)+ {$channel=HIDDEN;};

LICENSE: All source code in this post comes with ABSOLUTELY NO WARRANTY, This is free software, and you are welcome to redistribute it under certain conditions, see GPL 2.

DrawCurve / AddCurve for WPF / Cardinal Spline

For my project I need to draw Cardinal splines. Preferable the same way I was used to with the System.Drawing.Graphics or System.Drawing.Drawing2D.GraphicsPath. Not knowing much about splines, I noticed the number of PathPoints on a graphics path. The curve is drawn with beziers, so it’s some sort of string of beziers that make up this cardinal spline.

This project results in the same curves as you are used to with the graphics object (but this works with doubles not with floats). Tension and open and closed params are supported!

Anyway long story short, this project calculates control points for beziers. The curve will be draw with beziers, thus the outcome is exactly the same as with the Graphics object. I’ve seen other examples drawing the curve by them self calculating 100’s of points on the bezier it self .

Yes the graphics object does the same thing see msdn (quote:”Remarks The user must keep the original points if they are needed. The original points are converted to cubic Bézier control points internally, therefore there is no mechanism for returning the original points.”) .

If you draw a curve with 3 points, 7 points are needed to draw the curve, thus a performance gain(? not measured!) without losing the nice smoothness of the curve.

here is a screen shot:

cardinalsplineclosedexample

and here is the code

comments are welcome, Can u use this, or are you missing something?

ArcTo / ArcSegment like GraphicsPath.AddArc

This article gives a simpler interface (facade) to use the ArcTo Method… Providing a GraphicsPath AddArc like method.

public static GeometryDrawing GetArc(Rect rect, Angle start, Angle sweep){}

Today I implemented an adapter for GDI+ and WPF to draw similar objects. Simple ons like line, rectangle, ellipse,.. but I kinda struggled with arc??!!

public abstract void ArcTo(
	Point point,
	Size size,
	double rotationAngle,
	bool isLargeArc,
	SweepDirection sweepDirection,
	bool isStroked,
	bool isSmoothJoin
)

This is an Api that has way to much parameters if you ask me. All I need (and want) was a api like the AddArc on the GDI+ GraphicsPath which looks like :

public void AddArc(
	RectangleF rect,
	float startAngle,
	float sweepAngle
)

Which is all I need anyway.

In My Code I’m using the ArcTo method on a StreamGeometryContext instance (ArcSegment has a similair api so I’mmentioning it). First you have to move to the first point (which is done by calling BeginFigure) then you have to add a second point. Next argument Size has an interesting description on the MSDN documentation:

The width and height of an oval whose perimeter is used to draw the angle. If the oval is very rounded in all directions, the arc will be rounded, if it is nearly flat, so will the arc. For example, a very large width and height would represent a very large oval, which would give a slight curvature for the angle.

Ehh.. right.. how about just saying radius?

The next argument, rotationangle, can be done with a Matrix transform (rotate),.. so why it it there?

well the others make sense…

Well here is my facade to the ArcTo:

public static GeometryDrawing GetArc(Rect rect, Angle start, Angle sweep)

the way i like it,.. nice and simple.

public static GeometryDrawing GetArc(Rect rect, Angle start, Angle sweep)
        {
            GeometryDrawing ret = new GeometryDrawing();
            StreamGeometry geo = new StreamGeometry();

            //Set correct parameters
            SweepDirection sweepDir = sweep.Degrees < 0 ? SweepDirection.Counterclockwise : SweepDirection.Clockwise;
            bool isLargeArc = Math.Abs(sweep.Degrees) > 180;

            double cx = rect.Width / 2;
            double cy = rect.Height / 2;
            //Calculate start point
            double x1 = rect.X + cx + (Math.Cos(start.Radians) * cx);
            double y1 = rect.Y + cy + (Math.Sin(start.Radians) * cy);
            //Calculate end point
            double x2 = rect.X + cx + (Math.Cos(start.Radians + sweep.Radians) * cx);
            double y2 = rect.Y + cy + (Math.Sin(start.Radians + sweep.Radians) * cy);

            using (StreamGeometryContext ctx = geo.Open())
            {
                ctx.BeginFigure(new Point(x1, y1), false, false);
                ctx.ArcTo(new Point(x2, y2), new Size(cx,cy), 0, isLargeArc, sweepDir, true, false);
            }

            ret.Geometry = geo;
            return ret;
        }

!NOTE! The Angle class is just a simple converter form degrees to radians Nothing to worry about….

Here you can download a sample solution to get the demo working.

Or use this Extention Method in a static class..

public static void AddArc(this StreamGeometryContext ctx, Rect rect, double startRad, double sweepRad)
{
SweepDirection sweepDir = sweepRad < 0 ? SweepDirection.Counterclockwise : SweepDirection.Clockwise;
bool isLargeArc = Math.Abs(sweepRad) > Math.PI;
double cx = rect.Width / 2;
double cy = rect.Height / 2;
double x1 = rect.X + cx + (Math.Cos(startRad) * cx);
double y1 = rect.Y + cy + (Math.Sin(startRad) * cy);
double x2 = rect.X + cx + (Math.Cos(startRad + sweepRad) * cx);
double y2 = rect.Y + cy + (Math.Sin(startRad + sweepRad) * cy);
ctx.BeginFigure(new Point(x1, y1), false, false);
ctx.ArcTo(new Point(x2, y2), new Size(cx, cy), 0, isLargeArc, sweepDir, true, false);
}

your app might be bleeding inside with exceptions

Hi,..

Have you ever heard of “if it ain’t broke it’s bleeding inside” ?

This might be the case for your program, many of us use the try catch statement for catching exceptions. But you might be amazed by the shear number of exceptions that occur during runtime. We all know an exception is an exceptionally strong message that you need to respond to and not just smudge it away.

private void logMessage(string msg)
{
  try
  {
    File.AppendAllText(logfile, msg);
  }
  catch{} //ignore
}

There may be enough constructions like this throughout your code. When you throw an exception like that you need to think about this twice. At the very least tell other team members in comment why you did this.

So what happens when the drive is full? Well the entry is not written, but how bad is this?
What happens when the logfile value is configured (by app.config) with a non existing path? Is this a problem that needs to be handled?
What happens if you have insufficient rights to a certain location?

Think about it..

Let’s assume the”logMessage” function is not critical and we basicly do not care if does anything or not. You still have a performance issue. If the drive is full, why try to write again. Just set some flag and do not call File.AppendAllText… again.

Detecting the blood:

These are all just simple examples, endulge me and turn on the “Break when an exceptions is THROWN” Debug -> Exceptions, ath the “common Language Runtime Exceptions” check the Thrown and OK away. Now run your application.

If hardly nothing happens,.. good job!
If you feel like you are beeing shot at, time rethink some lines of code.

Synergy

If you are a geek like me the odds are good that you own more than just one PC. When operating on all of them a clutter of mice and keyboards all over the desk might be a reality for you. A college advised me Synergy, it shares one mouse and keyboard over multiple computers. Just watch the video to be impressed (not my setup!).

Synergy is an open source application.Click here to get it. Configuration is not hard, but’s not super duper mac user friendly.