YetAnotherForum
Welcome Guest Search | Active Topics | Log In | Register

Introduce a here document in addition to the @" syntax
belbardonisakel
#1 Posted : Monday, December 21, 2015 11:36:09 PM(UTC)
Rank: Newbie

Groups: Registered
Joined: 6/28/2015(UTC)
Posts: 9
Location: Germany

Thanks: 0 times
Was thanked: 0 time(s) in 0 post(s)
Hello squirrel developers -

I would like to add a real "here document" to the squirrel syntax.
I have some problems with the existing @"verb" syntax - it will get quite complicated and unreadable if you have to escape the " character itself.

Here a code snipped:

local here=@<END_OF_HERE
This text is the content
of the here document
containing some strange characters
"§$%&&"""@@\\\"
END_OF_HERE;
print(here);

Will produce the output:

This text is the content
of the here document
containing some strange characters
"§$%&&"""@@\\\"


The patch is (on base of Version 3_0_7_stable)

diff -ur squirrel_3_0_7_stable/SQUIRREL3/squirrel/sqlexer.cpp patch/SQUIRREL3/squirrel/sqlexer.cpp
--- squirrel_3_0_7_stable/SQUIRREL3/squirrel/sqlexer.cpp 2015-12-22 00:26:49.542123009 +0100
+++ patch/SQUIRREL3/squirrel/sqlexer.cpp 2015-12-22 00:26:30.838030261 +0100
@@ -195,14 +195,26 @@
case _SC('@'): {
SQInteger stype;
NEXT();
- if(CUR_CHAR != _SC('"')) {
- RETURN_TOKEN('@');
- }
- if((stype=ReadString('"',true))!=-1) {
- RETURN_TOKEN(stype);
+ switch(CUR_CHAR){
+ case _SC('<'):
+ NEXT();
+ if (ReadID()!=TK_IDENTIFIER){
+ Error(_SC("expected a here document identifier"));
+ }
+ if ((stype=ReadHereDocument())!=-1){
+ RETURN_TOKEN(stype);
+ }
+ Error(_SC("error parsing the here document"));
+ break;
+ case _SC('"'):
+ if((stype=ReadString('"',true))!=-1) {
+ RETURN_TOKEN(stype);
+ }
+ Error(_SC("error parsing the string"));
+ default:
+ RETURN_TOKEN('@');
+ }
}
- Error(_SC("error parsing the string"));
- }
case _SC('"'):
case _SC('\''): {
SQInteger stype;
@@ -285,6 +297,44 @@
return TK_IDENTIFIER;
}

+SQInteger SQLexer::ReadHereDocument()
+{
+ sqvector<SQChar> here_delimiter;
+ here_delimiter.resize( 0 );
+ here_delimiter.copy( _longstr );
+ INIT_TEMP_STRING();
+ do{
+ NEXT();
+ }while( strchr( _SC("\t\r; "), CUR_CHAR ) != NULL);
+ if( CUR_CHAR == _SC('\n') )
+ NEXT();
+ if( IS_EOB() ) return -1;
+ SQInteger iCutIndex = -1;
+ for(;;) {
+ const SQChar* ptIter = &here_delimiter[0];
+ if ( CUR_CHAR == SQUIRREL_EOB){
+ Error(_SC("unfinished here document"));
+ return -1;
+ }
+ if ( CUR_CHAR == _SC('\n') )
+ iCutIndex = _longstr.size();
+ do{
+ if ( *ptIter == _SC('\0') ){
+ APPEND_CHAR( CUR_CHAR );
+ TERMINATE_BUFFER();
+ if (iCutIndex>=0)
+ _longstr[ iCutIndex + 1 ] = _SC('\0');
+ else
+ _longstr[ _longstr.size() - here_delimiter.size() + 1 ] = _SC('\0');
+ _svalue = &_longstr[0];
+ return TK_STRING_LITERAL;
+ }
+ APPEND_CHAR( CUR_CHAR );
+ NEXT();
+ ptIter++;
+ } while (( CUR_CHAR == *ptIter )||( *ptIter == _SC('\0') ));
+ }
+}

SQInteger SQLexer::ReadString(SQInteger ndelim,bool verbatim)
{
diff -ur squirrel_3_0_7_stable/SQUIRREL3/squirrel/sqlexer.h patch/SQUIRREL3/squirrel/sqlexer.h
--- squirrel_3_0_7_stable/SQUIRREL3/squirrel/sqlexer.h 2015-12-22 00:26:49.542123009 +0100
+++ patch/SQUIRREL3/squirrel/sqlexer.h 2015-12-22 00:26:30.842030280 +0100
@@ -19,6 +19,7 @@
private:
SQInteger GetIDType(SQChar *s);
SQInteger ReadString(SQInteger ndelim,bool verbatim);
+ SQInteger ReadHereDocument();
SQInteger ReadNumber();
void LexBlockComment();
void LexLineComment();

If you want me to make some changes or comment the code please give me a reply.


Regards Heiko
absence
#2 Posted : Wednesday, December 23, 2015 1:30:14 PM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 8/23/2014(UTC)
Posts: 107
Man
Location: Northern Germany & Lincolnshire, U.K.

Thanks: 1 times
Was thanked: 10 time(s) in 10 post(s)
local a=<DELIMITER;
somestuffDELIMITER ;

reads weird, but would be valid.
Also, with your implementation odd line end encodings could possibly enter the delimiter string and break things.
I see the use case, but my personal opinion is that this is an unessecary expansion of the (already somewhat unnecessary) verbatim string syntax simply adding too much footprint for its effect.
belbardonisakel
#3 Posted : Wednesday, December 23, 2015 3:42:41 PM(UTC)
Rank: Newbie

Groups: Registered
Joined: 6/28/2015(UTC)
Posts: 9
Location: Germany

Thanks: 0 times
Was thanked: 0 time(s) in 0 post(s)
Hello Absence

my suggestd syntax of an "Here Document" is usual in bash (http://www.tldp.org/LDP/abs/html/here-docs.html) Pearl (http://perlmaven.com/here-documents)
and PHP (http://alvinalexander.com/blog/post/php/php-here-document-heredoc-syntax-examples).

There is something similar in other languages like Python (http://lofic.github.io/tips/python-heredoc.html). But i am not sticked to the syntax.

>>Also, with your implementation odd line end encodings could possibly enter the delimiter string and break things.
I do not see any side effect - the actual squirrek implementation disallow the suggested "@< END_OF_TOKEN" syntax.
I can't imagine an example code which will break things - could you give me an example to understand your issue ?
If you're not satisfied with my Implementation i will change it due to your requirements.

>>simply adding too much footprint for its effect.
My change is only part of the lexer (sqlexer.cpp), so if you use precompiled squirrel code you can use the interpreter without the lexer.
So i see no larger footprint - is the exact same size.
You will only get an larger footprint if the lexer is included.

The reason for my change is that i use squirrel in an small embedded environment to call some external tools in bash and i have to excape the things which leads to unreadable code.
I am not happy with the existing verbatim mode where you have to ecape characters. My opinion is if a language supports verbatim then it shuld be a real verbatim mode not an 80%.

Regards Heiko
absence
#4 Posted : Thursday, December 24, 2015 10:37:25 PM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 8/23/2014(UTC)
Posts: 107
Man
Location: Northern Germany & Lincolnshire, U.K.

Thanks: 1 times
Was thanked: 10 time(s) in 10 post(s)
As I said, it's my opinion. Embedding different languages in one file is something I'd avoid in general. If I were you, I'd put those shell script stuff into separate files and then let squirrel read and process them as data. But that's your choice of course. (Maybe you wouldn't be tempted if you had a proper syntax highlighting, which a "here document" will confuse as f*ck...?)

However, your code seems to fail handling different line ending encodings.
A Linux file with:
local a=@<DELIMITER
blahblah
DELIMITER;

won't work, the delimiter definition will end on the ; in the last line and read "DELIMITER\nblahblah\nDELIMITER" - and then will fail when it reads a verbatim string up to the end of file







belbardonisakel
#5 Posted : Friday, December 25, 2015 7:43:38 PM(UTC)
Rank: Newbie

Groups: Registered
Joined: 6/28/2015(UTC)
Posts: 9
Location: Germany

Thanks: 0 times
Was thanked: 0 time(s) in 0 post(s)
Hello absence

thank you very much for your code review :-)

>If I were you, I'd put those shell script stuff into separate files and then let squirrel read and process them as data. But that's your choice of course.
You're right - this would be an option - but i like to ship only a single file and of course I like the squirrel language - so i prefer to code in a single language only ;-)

>... won't work, the delimiter definition will end on the ; in the last line and read "DELIMITER\nblahblah\nDELIMITER" - and then will fail when it reads a verbatim string up to the end of file

You are right - the last ';' (if it's there at all - it is optional) will not be handled by my code - but it will be treated by the squirrel interpreter, because a single ';' is a valid program - so the standard lexer will skip it away.

I tried out to reproduce the problem you described, but i was not able to do so :-(
Maybe i understand you wrong - I tried out the following testvectors:

Test 1.) Windows line ending - the file ends with windows new line:
chef@chef-A3F:~/diff/patch/SQUIRREL3/bin$ xxd test.nut
0000000: 6c6f 6361 6c20 6865 7265 3d40 3c44 454c local here=@<DEL
0000010: 494d 4954 4552 0d0a 626c 6168 626c 6168 IMITER..blahblah
0000020: 0d0a 4445 4c49 4d49 5445 523b 0d0a ..DELIMITER;..
chef@chef-A3F:~/diff/patch/SQUIRREL3/bin$ ./sq test.nut
chef@chef-A3F:~/diff/patch/SQUIRREL3/bin$

Test 2.) Linux line ending - the file ends with linux new line:
chef@chef-A3F:~/diff/patch/SQUIRREL3/bin$ xxd test.nut
0000000: 6c6f 6361 6c20 6865 7265 3d40 3c44 454c local here=@<DEL
0000010: 494d 4954 4552 0a62 6c61 6862 6c61 680a IMITER.blahblah.
0000020: 4445 4c49 4d49 5445 523b 0a DELIMITER;.
chef@chef-A3F:~/diff/patch/SQUIRREL3/bin$ ./sq test.nut
chef@chef-A3F:~/diff/patch/SQUIRREL3/bin$

Test 3.) Windows line ending - file ends with ';'
chef@chef-A3F:~/diff/patch/SQUIRREL3/bin$ xxd testcut.nut
0000000: 6c6f 6361 6c20 6865 7265 3d40 3c44 454c local here=@<DEL
0000010: 494d 4954 4552 0d0a 626c 6168 626c 6168 IMITER..blahblah
0000020: 0d0a 4445 4c49 4d49 5445 523b ..DELIMITER;
chef@chef-A3F:~/diff/patch/SQUIRREL3/bin$ ./sq testcut.nut
chef@chef-A3F:~/diff/patch/SQUIRREL3/bin$

Test 4.) Linux line ending - file ends with ';':
chef@chef-A3F:~/diff/patch/SQUIRREL3/bin$ xxd testcut.nut
0000000: 6c6f 6361 6c20 6865 7265 3d40 3c44 454c local here=@<DEL
0000010: 494d 4954 4552 0a62 6c61 6862 6c61 680a IMITER.blahblah.
0000020: 4445 4c49 4d49 5445 523b DELIMITER;
chef@chef-A3F:~/diff/patch/SQUIRREL3/bin$ ./sq testcut.nut
chef@chef-A3F:~/diff/patch/SQUIRREL3/bin$

Test 5.) Windows file ending - file ends with DELIMITER:
chef@chef-A3F:~/diff/patch/SQUIRREL3/bin$ xxd testcut.nut
0000000: 6c6f 6361 6c20 6865 7265 3d40 3c44 454c local here=@<DEL
0000010: 494d 4954 4552 0d0a 626c 6168 626c 6168 IMITER..blahblah
0000020: 0d0a 4445 4c49 4d49 5445 52 ..DELIMITER
chef@chef-A3F:~/diff/patch/SQUIRREL3/bin$ ./sq testcut.nut
chef@chef-A3F:~/diff/patch/SQUIRREL3/bin$

Test 6.) Linux file ending - file ends with DELIMITER:
chef@chef-A3F:~/diff/patch/SQUIRREL3/bin$ xxd testcut.nut
0000000: 6c6f 6361 6c20 6865 7265 3d40 3c44 454c local here=@<DEL
0000010: 494d 4954 4552 0a62 6c61 6862 6c61 680a IMITER.blahblah.
0000020: 4445 4c49 4d49 5445 52 DELIMITER
chef@chef-A3F:~/diff/patch/SQUIRREL3/bin$ ./sq testcut.nut
chef@chef-A3F:~/diff/patch/SQUIRREL3/bin$

Test 7.) Malformed "here document" linux line ending encoding:
chef@chef-A3F:~/diff/patch/SQUIRREL3/bin$ xxd testcut.nut
0000000: 6c6f 6361 6c20 6865 7265 3d40 3c44 454c local here=@<DEL
0000010: 494d 4954 4552 0a62 6c61 6862 6c61 680a IMITER.blahblah.
0000020: 4445 4c49 4d49 5445 DELIMITE
chef@chef-A3F:~/diff/patch/SQUIRREL3/bin$ ./sq testcut.nut
testcut.nut line = (1) column = (40) : error unfinished here document
Error [unfinished here document]

Test 8.) Malformed "here document" windows line ending encoding:
chef@chef-A3F:~/diff/patch/SQUIRREL3/bin$ xxd testcut.nut
0000000: 6c6f 6361 6c20 6865 7265 3d40 3c44 454c local here=@<DEL
0000010: 494d 4954 4552 0d0a 626c 6168 626c 6168 IMITER..blahblah
0000020: 0d0a 4445 4c49 4d49 5445 ..DELIMITE
chef@chef-A3F:~/diff/patch/SQUIRREL3/bin$ ./sq testcut.nut
testcut.nut line = (1) column = (42) : error unfinished here document
Error [unfinished here document]

Tx for your support
and merry christmas
Users browsing this topic
Guest
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

Clean Slate theme by Jaben Cargman (Tiny Gecko)
Powered by YAF 1.9.4 | YAF © 2003-2010, Yet Another Forum.NET
This page was generated in 0.236 seconds.