Discussion:
from elsewhere, an assembler
(too old to reply)
cr88192
2007-04-05 22:45:58 UTC
Permalink
well, recently I had been posting some on c.l.a.x, and I have come here. as
I have recently heard, the group is human moderated (vs. machine moderated),
ok, good comments can be made for the moderator for actually going quickly
(seen many moderated groups where one may wait a week or more, and then
forget they ever posted until something shows up, or more often, it never
does...).


so, what I will mention is this:
for my own projects I have written an assembler (mostly since january).

now, what is any different than most other assemblers?
this one primarily targets in-memory compilation.
at present, I am not aware of other particularly similar projects.

if anyone feels like commenting, that would be cool.


General

basically, the task is taking output from various JIT compilers, also part
of my projects (my first JIT compiler was for my script lang, a newer one is
for a more recent C-compiler backend).

I use a (mostly) nasm-like syntax, albeit without any expression handling or
macro facilities at present, if ever (not needed for JIT, which is the
primary purpose).


otherwise, at present it can open the exe, and rip out the symbol table to
allow the host app to be more useful to dynamically assembled code
(windows-specific, though on linux this could probably be handled via libdl,
not yet tested on linux). I am considering possibly adding modular loading
of COFF files as well.

primarily it supports x86-32, but should in theory be able to handle 16 and
64 bits as well.

16-bit is likely to have problems, ie, I am unsure of the exact relation
between ModR/M format, cpu mode, and address overrides, additionally 'far'
forms of instructions are generally not implemented.

should support at least a good portion of the instruction set (including MMX
and SSE1/2), albeit I may have missed some and there are likely errors.

I have also added some features to help with PIC code (on x86, as x86-64 has
rip-relative mode, which is better...).


Syntax

as noted, the syntax is mostly like nasm, but some things are different.

sections are implemented by simply giving the name, such as '.text' or
'.data'.

multiple instructions may be grouped per line via ';', as in:
mov eax, [ebp+8]; push ebp
this is partly because often ';' looks nicer than '\n' in strings.

in some cases, I have renamed some forms of some instructions (originally, I
had renamed some forms of inc, and seperated out the 8-bit jmp/jcc forms,
but now this is handled automatically).


special features are different, for example,
getbase <reg>
setbase <symbol>
and $<name> are used for pic offsets (adjusted by the base and set to use
relative relocation).

as in:
getbase esi ;calc base and load into esi
mov eax, [esi+$str]

by default, the base is the start of the current assembly/module.
not that this is probably all that useful at present.


API (vague overview):

as noted, it assembles in-place. at present, it just assembles at the end of
some existing buffers, but may be later made to implement a kind of heap (so
that it is possible to allocate and free functions/modules, rather than just
gradually filling up the buffer).

things are grouped into a kind of conceptual unit (I am calling an
assembly). one uses a call to begin an assembly, and then generates any of
the assembler code via some number of printf like calls, and then ends the
assembly (at this point, the contents are assembled and a pointer to the
start of the assembly is returned).

JIT compilers typically do one function per assembly, but the asm loader
does a whole file per assembly (this will be the fundamental unit of
allocation/freeing, if/when implemented). another (but more complicated)
possibility would include symbolic garbage collection (likely via
ref-counting).



it is not online anywhere as of yet, but if anyone wants to look at the
source, I can probably email it to them (just ask via email or such, I can
also answer questions).

as noted, this is not intended as an end-user app/tool/lib, but if I am
lucky, maybe a potentially useful library (or as a starting point for other
interesting projects/libraries).

unlike nasm, IMO, the listing format is much cleaner (and is notably closer
to the form found in the intel/amd docs). if anything, maybe I have done
something by typing all this crap out (of course, nasm has some nifty pieces
of info I currently lack, like supported archs, ...).


a trivial example (calls printf, uses relative addressing):
.text

basm_main:
push ebp
mov ebp, esp

getbase ecx

lea eax, [ecx+$tststr]
push eax

call printf
pop ecx

pop ebp
ret

.data
tststr db "asm test string\n", 0

and a listing fragment (basic syntax):
add
04,ib al,i8
X80/0,ib rm8,i8
WX83/0,ib rm16,i8
TX83/0,ib rm32,i8
X83/0,ib rm64,i8
X02/r r8,rm8

where W/T/X/... tell where prefixes go (Word, DWord, REX).



any comments?...


will keep sig, this time, in case anyone cares:
--
BGB, 23 M S GU
cr88192 at hotmail dot com
SpooK
2007-04-05 23:15:23 UTC
Permalink
Post by cr88192
any comments?...
Comments??? Evidently not, if the semi-colon is being used in an HLL-
like manner :P
cr88192
2007-04-05 23:34:19 UTC
Permalink
Post by SpooK
Post by cr88192
any comments?...
Comments??? Evidently not, if the semi-colon is being used in an HLL-
like manner :P
the semicolon is space-sensitive so:
push ebp; mov ebp, esp

is parsed as 2 instuctions (the semicolon is regarded as a seperator here).

however:
push ebp ;mov ebp, esp

is parsed as a single instruction followed by a comment.

basically, if any whitespace comes before the semicolon, it is treated as a
comment (whitespace afterwards makes no difference).



potentially this is confusing, but personally I don't think it is a major
issue.

at first it was unsettling as well, but at the time I opted with C-style
comments /* ... */ and //...

after I thought up the whitespace trick, I can do both line globbing and
more traditional comments.
Betov
2007-04-06 07:06:33 UTC
Permalink
Post by cr88192
Post by SpooK
Post by cr88192
any comments?...
Comments??? Evidently not, if the semi-colon is being used in an HLL-
like manner :P
push ebp; mov ebp, esp
is parsed as 2 instuctions (the semicolon is regarded as a seperator here).
push ebp ;mov ebp, esp
is parsed as a single instruction followed by a comment.
basically, if any whitespace comes before the semicolon, it is treated
as a comment (whitespace afterwards makes no difference).
potentially this is confusing, but personally I don't think it is a
major issue.
at first it was unsettling as well, but at the time I opted with
C-style comments /* ... */ and //...
after I thought up the whitespace trick, I can do both line globbing
and more traditional comments.
I have to second Spook, here. ';' has been the traditional char
for comments, and there is no reason for modifying this. Also,
in order to resist to the natural babelism tendancies, it would
be a good thing to do the same choices, the other Author have
done, when this is without any importance. The only Assembler
i know about, which enables with multi-Instructions-lines is
RosAsm, and it syntax is:

mov eax ebx | mov edx 0 | div ecx

Unless it would be a problem for your parser to make a difference
with an expression, why not doing the same choice?


Betov.

< http://rosasm.org >
Betov
2007-04-06 07:45:58 UTC
Permalink
Post by cr88192
push ebp; mov ebp, esp
is parsed as 2 instuctions (the semicolon is regarded as a seperator here).
push ebp ;mov ebp, esp
is parsed as a single instruction followed by a comment.
Oh! I skipped over this. And what of:

push ebp ; mov ebp, esp
push ebp;mov ebp, esp


? A "semi-comment" and an "half-comment" ?

:))

This is... terrible, in my opinion, and particulary absurd.
A kind of "Anti-Flexibility" performance.


Betov.

< http://rosasm.org >
cr88192
2007-04-06 08:46:21 UTC
Permalink
Post by Betov
Post by cr88192
push ebp; mov ebp, esp
is parsed as 2 instuctions (the semicolon is regarded as a seperator here).
push ebp ;mov ebp, esp
is parsed as a single instruction followed by a comment.
push ebp ; mov ebp, esp
push ebp;mov ebp, esp
? A "semi-comment" and an "half-comment" ?
the former will be parsed as a comment (only prefix space is considered in
this case).

the later will be parsed as globbed instructions.
Post by Betov
:))
This is... terrible, in my opinion, and particulary absurd.
A kind of "Anti-Flexibility" performance.
I don't see what the big deal is really.


in many of my languages, I have made use of similar rules (where and how
whitespace is present, ...) wrt, for example, expression parsing,
distinguishing between function calls and definitions, ...


{foo: 1 2 -1 x*y}

fib(x)
if(x>2)fib(x-1)+fib(x-2) else 1;

foo(x)
bar(y)
vs:
foo(x)
bar(x)

...


intuitively, these things may seem rather distasteful, but IME they tend to
work out fairly well in practice...
Post by Betov
Betov.
< http://rosasm.org >
Betov
2007-04-06 08:55:35 UTC
Permalink
Post by cr88192
Post by Betov
This is... terrible, in my opinion, and particulary absurd.
A kind of "Anti-Flexibility" performance.
I don't see what the big deal is really.
Well, do it the way you like, but Asmers used to normal syntaxes
will encount serious problems with a choice that will switch as
easily as this, from comment to instruction, and reverse. In my
opinion, the very first point for a good syntax is with its level
of readability, and if " ;" is completely different from "; ", i
do not think that we could talk of readability ease.


Betov.

< http://rosasm.org >
cr88192
2007-04-06 09:38:02 UTC
Permalink
Post by Betov
Post by cr88192
Post by Betov
This is... terrible, in my opinion, and particulary absurd.
A kind of "Anti-Flexibility" performance.
I don't see what the big deal is really.
Well, do it the way you like, but Asmers used to normal syntaxes
will encount serious problems with a choice that will switch as
easily as this, from comment to instruction, and reverse. In my
opinion, the very first point for a good syntax is with its level
of readability, and if " ;" is completely different from "; ", i
do not think that we could talk of readability ease.
I think this is partly a carry over from my experiences with many HLLs,
where typically there is a lot of magic that has to be squeezed out of the
fairly small character set that is the keyboard (and ideally without the
addition of more or longer tokens).


maybe since assembler has a much lower syntactic pressure, people are much
less used to subtle differences of this sort, I don't know.

x+y&3+z
x+**y++
x/*y
i<<16+*s++
T<x,y>i
i=3,4;
if(x=3)x--;
...
Post by Betov
Betov.
< http://rosasm.org >
cr88192
2007-04-06 08:03:58 UTC
Permalink
Post by Betov
Post by cr88192
Post by SpooK
Post by cr88192
any comments?...
Comments??? Evidently not, if the semi-colon is being used in an HLL-
like manner :P
push ebp; mov ebp, esp
is parsed as 2 instuctions (the semicolon is regarded as a seperator here).
push ebp ;mov ebp, esp
is parsed as a single instruction followed by a comment.
basically, if any whitespace comes before the semicolon, it is treated
as a comment (whitespace afterwards makes no difference).
potentially this is confusing, but personally I don't think it is a
major issue.
at first it was unsettling as well, but at the time I opted with
C-style comments /* ... */ and //...
after I thought up the whitespace trick, I can do both line globbing
and more traditional comments.
I have to second Spook, here. ';' has been the traditional char
for comments, and there is no reason for modifying this. Also,
in order to resist to the natural babelism tendancies, it would
be a good thing to do the same choices, the other Author have
done, when this is without any importance. The only Assembler
i know about, which enables with multi-Instructions-lines is
mov eax ebx | mov edx 0 | div ecx
Unless it would be a problem for your parser to make a difference
with an expression, why not doing the same choice?
well, I had not heard of rosasm at all until today...


so, when I was adding this feature to begin with, for all I knew no other
assemblers did this (well, apart from gas, which in relevant cases also uses
';').

ok, of note, otherwise I am not that much of a fan of gas syntax...


also, in my case, I have a whole lot of code (several JIT compilers), that
generate masses of code using ';' this way (along with a few forms from
earlier versions of the syntax), so changing it would be too much of a
hassle IMO (albeit if needed I could add '|' for this as well).


otherwise, most normal assembler code will probably not be using this
feature (mostly it is intended for JIT convinience).

I had been working on the assembler some, making it accept a lot more of
nasm's syntax (point being that I could make it possible to write assembly
files that work in both tools, ie, as a kind of least-common-denominator).

also fixed up a few issues in the process (like adding address size override
support, making the disassembler able to disassemble 16 bit code and
overrided code, ...).


I also like ';' personally (since this is what C, and many other langs,
use). of course, it is also a problem that an unaware assembler will simply
miss a lot of the opcodes (as opposed to complaining).

initially I had considered a few other options, such as ':', '_', and '\\',
but didn't really like these ('\\' would be annoying to type endlessly and
in strings looks just about at bad as '\n'...).


I will just idly assume that a coder can figure out the difference between:

push edx; push eax

and:
mov ecx, [foo] ;yo...



oh yes, my server seems to be working, at least from here...

to restate:
http://cr88192.dyndns.org:8080/

(had to make use of socks 5 proxying to be able to test the thing though, as
my router makes a big load of trouble for all this...).
Post by Betov
Betov.
< http://rosasm.org >
Betov
2007-04-06 08:47:00 UTC
Permalink
Post by cr88192
http://cr88192.dyndns.org:8080/
A quick look at the Doc shows a couple of issues:

* No introduction Doc. Imagine a beginner reading this. His
questions are 1) "What is this", 2) "What use is it for?",
3) "How to run it for a try?" 4) "Where is an example of
usage?"... and so on... none of these...

* You seem to "rename" for conveniences (_your_ conveniences),
and, in my opinion, this is a bad thing. Assembly is Assembly.
Fortunately, we have an almost complete and fixed list of
mnemonics introduced by the industry, and modifying anything
with this, can have no effect but to degrade the situation.
We have here (ALA), a competent (but "original") Programmer
(Herbert), who developped his home made Syntax, and mind you,
as opposed to you (you are wrong with the "renamings"), he is
_right_ on all points (even though i dislike his Syntax). Now,
dispiting the fact that he is right on his arguements, he is
also completly wrong on the usage:

Way better is a more or less valid syntax, that as many home
made syntaxes as Assembler's authors.

* Assembly is the first and only Language for computers. Historicaly,
there was absolutely no reason but market's ones, for creating
any HLL. So this is not to the Asm syntax to make any step in the
direction of definitive right-wing absurdities like C (I am talking
about your comments notations). A nice way for multi-lines Comments
that all Asmers seem to appreciate is:

;;
This is
a Multi-Lines
comment
;;

Where the delimiters are "LF;;CR", which are quite easy and
fast to parse, at the same time as lines-comments:

";" found out ---> Is "LF;;CR" at [esi-2]
- No ---> Line
- Yes ---> down to next "LF;;CR" or EOF.


Betov.

< http://rosasm.org >
cr88192
2007-04-06 09:53:06 UTC
Permalink
Post by Betov
Post by cr88192
http://cr88192.dyndns.org:8080/
* No introduction Doc. Imagine a beginner reading this. His
questions are 1) "What is this", 2) "What use is it for?",
3) "How to run it for a try?" 4) "Where is an example of
usage?"... and so on... none of these...
this is hardly an end-user lib, but still better docs may make sense.
some of the docs I do have started out partly as idea-specs, so there are
many places where features are either implemented differently, or not at
all.
Post by Betov
* You seem to "rename" for conveniences (_your_ conveniences),
and, in my opinion, this is a bad thing. Assembly is Assembly.
Fortunately, we have an almost complete and fixed list of
mnemonics introduced by the industry, and modifying anything
with this, can have no effect but to degrade the situation.
We have here (ALA), a competent (but "original") Programmer
(Herbert), who developped his home made Syntax, and mind you,
as opposed to you (you are wrong with the "renamings"), he is
_right_ on all points (even though i dislike his Syntax). Now,
dispiting the fact that he is right on his arguements, he is
well, some of the renamings have been dropped or fixed up.

ie, the inc_r/dec_r, ja_b, ... issues and similar have since been largely
cleared up (originally, they existed to work around limitations in the
implementation).

the register renaming is more a "use at your own risk" feature...

note that in all this, the point was more that this would be working with
machine-generated, rather than human-generated assembly.

only later on did I really start considering using it for hand-written
assembly (which is when I started clearing up the syntax).

actually, the earliest versions didn't even have a parser, the idea was that
everything would be done through function calls (the parser was added since
this was just too damn tedious, and it was nicer to just 'print' opcodes).
Post by Betov
Way better is a more or less valid syntax, that as many home
made syntaxes as Assembler's authors.
* Assembly is the first and only Language for computers. Historicaly,
there was absolutely no reason but market's ones, for creating
any HLL. So this is not to the Asm syntax to make any step in the
direction of definitive right-wing absurdities like C (I am talking
about your comments notations). A nice way for multi-lines Comments
;;
This is
a Multi-Lines
comment
;;
Where the delimiters are "LF;;CR", which are quite easy and
";" found out ---> Is "LF;;CR" at [esi-2]
- No ---> Line
- Yes ---> down to next "LF;;CR" or EOF.
maybe.

in part, I was sticking to:
1, some things GAS does;
2, what was convinient (I started out by just reusing a HLL tokenizer, which
had been previously using C style comments).

as for the ;; multi-line syntax, this is the first time I have seen it.
Post by Betov
Betov.
< http://rosasm.org >
Betov
2007-04-06 10:39:38 UTC
Permalink
Post by cr88192
1, some things GAS does;
Yes, i guess this, but GAS has never been an Assembler used for
really programming in Assembler. Personaly, i do not even have it
on my Disks (and i have a lot of Assemblers... I do not even know
if it exists for Win32... i think yes... but i wouldn't even make
the effort of downloading for taking a look).
Post by cr88192
2, what was convinient (I started out by just reusing a HLL tokenizer,
which had been previously using C style comments).
as for the ;; multi-line syntax, this is the first time I have seen it.
Fact is that i was the one introducing this, but i have read,
for example, posts of FASM users, proposing this as wish for
their prefered Assembler. There was, also, recently, a guy posting
about a Sources' Documentation Tool, working with this simple
organization. So, i don't seem to be the only one considering
this solution a neat and easy one.


Betov.

< http://rosasm.org >
cr88192
2007-04-06 21:30:22 UTC
Permalink
Post by Betov
Post by cr88192
1, some things GAS does;
Yes, i guess this, but GAS has never been an Assembler used for
really programming in Assembler. Personaly, i do not even have it
on my Disks (and i have a lot of Assemblers... I do not even know
if it exists for Win32... i think yes... but i wouldn't even make
the effort of downloading for taking a look).
well, in my case it is forced on me, because I am actually primarily a C
coder, and I use gcc for my windows development (usually mingw, but for some
small things I use cygwin).
Post by Betov
Post by cr88192
2, what was convinient (I started out by just reusing a HLL tokenizer,
which had been previously using C style comments).
as for the ;; multi-line syntax, this is the first time I have seen it.
Fact is that i was the one introducing this, but i have read,
for example, posts of FASM users, proposing this as wish for
their prefered Assembler. There was, also, recently, a guy posting
about a Sources' Documentation Tool, working with this simple
organization. So, i don't seem to be the only one considering
this solution a neat and easy one.
ok.


misc:
after messing some with apache, I can see some use for #! support in a tool,
but probably wont add this, as raw asm is probably not a great choice for
cgi. was messing around trying to get sh to work, but with no real success.

this would likely require a little wrapper, ie: a spawn app for sh not
depending on bat files, or being spawned from a particular directory, or
similar.

not worth the bother right now.

great advantages of personal server:
no need to upload crap (and wait for the files to transfer with terrible
latencies and slow speeds);
can do whatever I want;
...

then again, can't handle as much traffic, theoretically, but I don't expect
much anyways...
Post by Betov
Betov.
< http://rosasm.org >
Jim Carlock
2007-04-07 00:49:53 UTC
Permalink
"cr88192" wrote...
: misc:
: after messing some with apache, I can see some use for #! support
: in a tool,

Heh? What's that #!? Perl's sh-bang? If you're familiar with C,
you may find PHP a blessing.

: but probably wont add this, as raw asm is probably not a great
: choice for cgi. Was messing around trying to get sh to work, but
: with no real success.

What is sh? Shell? Bourne shell, Korn shell, C shell?
--
Jim Carlock
Post replies to the group.
cr88192
2007-04-07 03:06:39 UTC
Permalink
Post by Jim Carlock
"cr88192" wrote...
: after messing some with apache, I can see some use for #! support
: in a tool,
Heh? What's that #!? Perl's sh-bang? If you're familiar with C,
you may find PHP a blessing.
'#!' is not just used in perl, but nearly any other lang that can be spawned
via this convention (many other lang implementations, including various
scheme implementations, python, ... end up supporting this).

on windows, I would rather have it use file extension or similar (and start
the app from its home, or some other controllable directory), but oh well...


I may look into this (haven't looked into PHP, just I know I am not much of
a fan of perl...).

I had just figured, since I had a server, may as well mess with CGI.
sadly, none of my languages is likely well suited to CGI (having typically
neither worthwhile string nor file/filesystem support).

had seemed better likely to just try to get sh/bash to work, but I don't
know...
Post by Jim Carlock
: but probably wont add this, as raw asm is probably not a great
: choice for cgi. Was messing around trying to get sh to work, but
: with no real success.
What is sh? Shell? Bourne shell, Korn shell, C shell?
informally, I was referring to 'whatever is bound to /bin/sh', which is
problematic on windows. usually, sh either is, or is an alias for, bash
(this is at least what is used on mingw and cygwin, albeit mingw uses bash
and just calls it sh).

I tried to use batch files this way, but cmd.exe complains about the #!
syntax.


oh well, an annoyance I guess for using apache on windows...

apache on linux would be better, but then I would have to run my test
computer as a server (and run it all the time). I opted with just using my
main computer, on windows, since it is on most of the time anyways...
Post by Jim Carlock
--
Jim Carlock
Post replies to the group.
Herbert Kleebauer
2007-04-06 08:34:42 UTC
Permalink
Post by Betov
I have to second Spook, here. ';' has been the traditional char
for comments, and there is no reason for modifying this. Also,
in order to resist to the natural babelism tendancies, it would
be a good thing to do the same choices, the other Author have
done, when this is without any importance.
The only way to get an improvement is to not make the same
choices as the other have done.
Post by Betov
The only Assembler
i know about, which enables with multi-Instructions-lines is
mov eax ebx | mov edx 0 | div ecx
Now, that surely is an example how a deviation from the
commonly accepted choices makes thing worse: to separate
the parameters by a "," makes the instruction much better
readable. And the "|" as a separator for instructions is
much to dominant. Compare the two lines and you will
see what's the better choice.

mov eax ebx | mov edx eax | mov ecx edx | div ecx

mov eax,ebx; mov edx,eax; mov ecx,edx; div ecx;

I don't know how problematic it is, to use the ; as an
instruction separator and as a start of a comment depending
on a preceding space, but I must say, I most probably
never used a ";" as a start of comment without at least
one space before it.

But much superior to both is the traditional way to only
write one instruction per line and use the src,dest
order:

move.l r3,r0
move.l r0,r1
move.l r1,r2
divu.l r2,r1|r0

On a brief glance you see what happens:

r3 -> r0 -> r1 -> r2

But with your format, you have to look even more than twice
to see what happens:

mov eax ebx | mov edx eax | mov ecx edx

eax <- ebx
+--------------------+
edx <- eax-+
+--------------------+
ecx <- edx-+
Post by Betov
Unless it would be a problem for your parser to make a difference
with an expression, why not doing the same choice?
No, if the old choices are good, keep them. If the old choices are
bad, change them. But you did exactly the opposite: change
the good and keep the bad. But it really doesn't matter as
long as your product is only for your private use (like an
assembler for writing applications).
Betov
2007-04-06 09:19:15 UTC
Permalink
Herbert Kleebauer <***@unibwm.de> écrivait news:46160622.95EA8A07
@unibwm.de:

Oh! I was talking about the wolf...

:))
Post by Herbert Kleebauer
Post by Betov
I have to second Spook, here. ';' has been the traditional char
for comments, and there is no reason for modifying this. Also,
in order to resist to the natural babelism tendancies, it would
be a good thing to do the same choices, the other Author have
done, when this is without any importance.
The only way to get an improvement is to not make the same
choices as the other have done.
Generally speacking? This is sometimes true and sometimes
false, Herbert. There are even cases, like yours about the
Asm syntax, where one can be both fully right and fully
wrong.
Post by Herbert Kleebauer
Post by Betov
The only Assembler
i know about, which enables with multi-Instructions-lines is
mov eax ebx | mov edx 0 | div ecx
Now, that surely is an example how a deviation from the
commonly accepted choices makes thing worse: to separate
the parameters by a "," makes the instruction much better
readable.
My own choices, in the whole syntax, has been that, on one hand,
_each_ char _must_ have a signification, and that, on the other
hand, the syntax must be a flexible as possible. On this very
point, i have choosen to make ',' the Line-Continuation Char,
what seems to me to have been a good choice. It makes it legal
to write, as well:

mov eax ebx | mov edx 0 | div ecx

mov,
eax,
ebx | mov edx,,,,,0
div,
,,, ecx

This is Flexibility, Herbert: Yes, flexibility includes utterly
unreadable horrors, if the user likes this. Flexibility is choice
and decisions of the user, instead of choice and decisions of the
Tool's author.
Post by Herbert Kleebauer
And the "|" as a separator for instructions is
much to dominant.
?
Post by Herbert Kleebauer
Compare the two lines and you will
see what's the better choice.
mov eax ebx | mov edx eax | mov ecx edx | div ecx
mov eax,ebx; mov edx,eax; mov ecx,edx; div ecx;
I don't know how problematic it is, to use the ; as an
instruction separator and as a start of a comment depending
on a preceding space, but I must say, I most probably
never used a ";" as a start of comment without at least
one space before it.
The problem is, to me, that i do not know of any x86 Assembler
which would encode anything but the first Instruction.
Post by Herbert Kleebauer
But much superior to both is the traditional way to only
write one instruction per line and use the src,dest
move.l r3,r0
move.l r0,r1
move.l r1,r2
divu.l r2,r1|r0
r3 -> r0 -> r1 -> r2
Same answer: I don't kown of any x86 Assembler (the ones used
by the Asmers), which would have a syntax like this one, and
i am unable to imagine any reason why we should modify this.
Post by Herbert Kleebauer
But with your format, you have to look even more than twice
mov eax ebx | mov edx eax | mov ecx edx
eax <- ebx
+--------------------+
edx <- eax-+
+--------------------+
ecx <- edx-+
What are you talking about? Rejecting the Multi-Instructions
lines concept? If yes... well... write it the way you'd like...
Post by Herbert Kleebauer
Post by Betov
Unless it would be a problem for your parser to make a difference
with an expression, why not doing the same choice?
No, if the old choices are good, keep them. If the old choices are
bad, change them. But you did exactly the opposite: change
the good and keep the bad.
Heuuu!...

1) I "changed" very few things in the traditional syntax.

2) RosAsm (if this is what you are talking about) accepts
a wide range of traditional forms.
Post by Herbert Kleebauer
But it really doesn't matter as
long as your product is only for your private use (like an
assembler for writing applications).
:))

Right.

:))

Betov.

< http://rosasm.org >
Herbert Kleebauer
2007-04-06 12:59:00 UTC
Permalink
Post by Betov
My own choices, in the whole syntax, has been that, on one hand,
_each_ char _must_ have a signification, and that, on the other
hand, the syntax must be a flexible as possible. On this very
point, i have choosen to make ',' the Line-Continuation Char,
what seems to me to have been a good choice. It makes it legal
mov eax ebx | mov edx 0 | div ecx
mov,
eax,
ebx | mov edx,,,,,0
div,
,,, ecx
Funny, you allow multi statements on a line and because then
the line can become to long, you need a line continuation
character to allow a line break within an instruction and
not only at instruction borders.
Post by Betov
Post by Herbert Kleebauer
mov eax ebx | mov edx eax | mov ecx edx
eax <- ebx
+--------------------+
edx <- eax-+
+--------------------+
ecx <- edx-+
What are you talking about? Rejecting the Multi-Instructions
lines concept? If yes... well... write it the way you'd like...
m'I gniklat tuoba gnitirw secnetnes morf tfel ot thgir tub sdrow
morf thgir ot tfel.
Jim Carlock
2007-04-06 14:31:22 UTC
Permalink
"Herbert Kleebauer" wrote...
: m'I gniklat tuoba gnitirw secnetnes morf tfel ot thgir tub sdrow
: morf thgir ot tfel.

That's not right...

mov eax, ebx

while...

xbe, xae vom

setauqe ot owt (ro erom) tnereffid ,stpecnoc .yleritne

What a mess, heh?
--
Jim Carlock
Post replies to the group.
Betov
2007-04-06 14:41:47 UTC
Permalink
Post by Herbert Kleebauer
Post by Betov
mov eax ebx | mov edx 0 | div ecx
mov,
eax,
ebx | mov edx,,,,,0
div,
,,, ecx
Funny, you allow multi statements on a line and because then
the line can become to long, you need a line continuation
character to allow a line break within an instruction and
not only at instruction borders.
I appreciate your humour, but some could read your sentences
without being aware that you are against the usage of Assembly
for writing Application, which is the very true and only purpose
of the RosAsm environment, and where there are many reasons for
multi-Instructions lines, for self-evident means, like readability,
coherency of the sources organizations,... and also for Multi-
Lines Statements, as fully evident with statements like:

call 'USER32.CreateWindowExA' &WS_EX_CLIENTEDGE,
WindowClassName,
WindowCaption,
&WS_OVERLAPPEDWINDOW__&WS_VISIBLE,
D$WindowX, D$WindowY,
D$WindowW, D$WindowH,
0,
D$MenuHandle,
D$hInstance,
0

Now, if you prefer moving your eyes up and down only, instead
of up, down, left and right, this is not a problem for RosAsm,
with which you can write it the way you like, in the form you
prefer. Here also, this is what the word "flexibility" means,
and the fact that you are refusing, on one hand, to consider
Assembly for developments, and on the other hand refuse to
consider the required facilities for making this language a
valid alternative to the HLLs, pushes yourself in a position
where "this" explains "that". I already told you: Your position
can be compared to the one of a schisophren: Self coherent.


Betov.

< http://rosasm.org >
Herbert Kleebauer
2007-04-06 16:51:45 UTC
Permalink
Post by Betov
I appreciate your humour, but some could read your sentences
without being aware that you are against the usage of Assembly
for writing Application, which is the very true and only purpose
of the RosAsm environment, and where there are many reasons for
multi-Instructions lines, for self-evident means, like readability,
coherency of the sources organizations,... and also for Multi-
call 'USER32.CreateWindowExA' &WS_EX_CLIENTEDGE,
WindowClassName,
WindowCaption,
&WS_OVERLAPPEDWINDOW__&WS_VISIBLE,
D$WindowX, D$WindowY,
D$WindowW, D$WindowH,
0,
D$MenuHandle,
D$hInstance,
0
And why should this be better readable than using real
assembly code?

moveq.l #0, -(sp) ; lpParam
move.l wc_hInstance, -(sp) ; hInstance
moveq.l #0, -(sp) ; hMenu
moveq.l #0, -(sp) ; hWndParent
move.l VWIN, -(sp) ; nHeight
move.l HWIN, -(sp) ; nWidth
moveq.l #0, -(sp) ; y
moveq.l #0, -(sp) ; x
move.l #$90000000, -(sp) ; dwStyle =WS_POPUP|WS_VISIBLE
move.l #text_null, -(sp) ; lpWindowName
move.l #text_erde, -(sp) ; lpClassName
moveq.l #0, -(sp) ; dwExStyle =default style
jsr.l (CreateWindowExA)
Betov
2007-04-06 17:11:58 UTC
Permalink
Post by Herbert Kleebauer
Post by Betov
call 'USER32.CreateWindowExA' &WS_EX_CLIENTEDGE,
WindowClassName,
WindowCaption,
&WS_OVERLAPPEDWINDOW__&WS_VISIBLE,
D$WindowX, D$WindowY,
D$WindowW, D$WindowH,
0,
D$MenuHandle,
D$hInstance,
0
And why should this be better readable than using real
assembly code?
moveq.l #0, -(sp) ; lpParam
move.l wc_hInstance, -(sp) ; hInstance
moveq.l #0, -(sp) ; hMenu
moveq.l #0, -(sp) ; hWndParent
move.l VWIN, -(sp) ; nHeight
move.l HWIN, -(sp) ; nWidth
moveq.l #0, -(sp) ; y
moveq.l #0, -(sp) ; x
move.l #$90000000, -(sp) ; dwStyle =WS_POPUP|WS_VISIBLE
move.l #text_null, -(sp) ; lpWindowName
move.l #text_erde, -(sp) ; lpClassName
moveq.l #0, -(sp) ; dwExStyle =default style
jsr.l (CreateWindowExA)
If you fail to see what is evident, there is no fun at explaining.
You know, this is like jokes: If you need to explain a joke, the
comic effect is gone.

:)

Seriously: And if all HLLs have choosen a form, similar to high level
Assembly, where is the miracle? Why did not the HLLs choose forms like
your prefered one, if yours is more readable?


Betov.

< http://rosasm.org >
Herbert Kleebauer
2007-04-06 19:38:13 UTC
Permalink
Post by Betov
Post by Herbert Kleebauer
Post by Betov
call 'USER32.CreateWindowExA' &WS_EX_CLIENTEDGE,
WindowClassName,
WindowCaption,
&WS_OVERLAPPEDWINDOW__&WS_VISIBLE,
D$WindowX, D$WindowY,
D$WindowW, D$WindowH,
0,
D$MenuHandle,
D$hInstance,
0
And why should this be better readable than using real
assembly code?
moveq.l #0, -(sp) ; lpParam
move.l wc_hInstance, -(sp) ; hInstance
moveq.l #0, -(sp) ; hMenu
moveq.l #0, -(sp) ; hWndParent
move.l VWIN, -(sp) ; nHeight
move.l HWIN, -(sp) ; nWidth
moveq.l #0, -(sp) ; y
moveq.l #0, -(sp) ; x
move.l #$90000000, -(sp) ; dwStyle =WS_POPUP|WS_VISIBLE
move.l #text_null, -(sp) ; lpWindowName
move.l #text_erde, -(sp) ; lpClassName
moveq.l #0, -(sp) ; dwExStyle =default style
jsr.l (CreateWindowExA)
If you fail to see what is evident, there is no fun at explaining.
You know, this is like jokes: If you need to explain a joke, the
comic effect is gone.
But maybe the author of the joke recognizes that the joke isn't
a joke at all. The only difference between your and my version is,
that in my code every line is prefixed with a "move.l" and ends
with a "-(sp) ; some comment". The column with the parameters
is nearly identical with two exception: you use the reverse
order (why? it only makes it harder to debug) and you put the
x/y values on one line. But the big advantage of my code is, you
exactly see what happens. And this is the only purpose of writing
assembly programs. If you don't care about the instruction level
why not use a HLL instead of an assembler?
Post by Betov
Seriously: And if all HLLs have choosen a form, similar to high level
Assembly, where is the miracle? Why did not the HLLs choose forms like
your prefered one, if yours is more readable?
Why do you use "add eax,ebx" and not eax=eax+ebx; or eax+=ebx; ?
It is pervert to make assembly code look like HLL code. If you
want code which looks like HLL code, use HLL code and you will
get much more advantages than only better readably source code.
Sometimes I get the feeling you and Randy are the same persons
posting with different email addresses.
Betov
2007-04-06 21:06:02 UTC
Permalink
Post by Herbert Kleebauer
The only difference between your and my version is,
that in my code every line is prefixed with a "move.l" and ends
with a "-(sp) ; some comment".
Yes. The so absurd and un-viable traditional way.
Post by Herbert Kleebauer
The column with the parameters
is nearly identical with two exception: you use the reverse
order (why?
Because it is the way, given by any documentation, and that
we can't program without documentation.
Post by Herbert Kleebauer
it only makes it harder to debug) and you put the
x/y values on one line. But the big advantage of my code is, you
exactly see what happens.
"See exactly what happens" with an API call?! Great.

:))
Post by Herbert Kleebauer
And this is the only purpose of writing
assembly programs.
The only purpose of writing program is exactly the same,
whatever the language.
Post by Herbert Kleebauer
If you don't care about the instruction level
why not use a HLL instead of an assembler?
If Assembly can do similar things as the HLLs do, for what
reasons were the HLLs invented, other than for market means?
Post by Herbert Kleebauer
Post by Betov
Seriously: And if all HLLs have choosen a form, similar to high level
Assembly, where is the miracle? Why did not the HLLs choose forms like
your prefered one, if yours is more readable?
Why do you use "add eax,ebx" and not eax=eax+ebx; or eax+=ebx; ?
Of course: Why not? By the way, there is a Pre-Parser, included
in RosAsm, that enables who wants to use it, to write:

eax = (eax+ebx)*5

Which is, very exactly, what i call "an HLL Pre-Parser". This
is to say, something that does no more belong to Assembly, at
a theorical point of view, but which is, also, the definitive
demonstration that, if the HLLs authors had have any small bit
of intellectual honnesty, they would never have written any HLL.
Post by Herbert Kleebauer
It is pervert to make assembly code look like HLL code. If you
want code which looks like HLL code, use HLL code and you will
get much more advantages than only better readably source code.
Sometimes I get the feeling you and Randy are the same persons
posting with different email addresses.
Even though this swindler stole several ideas of mines, there can
be absolutely no relationship in between an attempt of natural
evolution of Assembly, and an attempt of destruction of Assembly.

Fact is that, as all of the older traditionalists Asmers have faced,
writting a Program with such an absurd writing style as the one you
would like to restrict Asm to, is not viable above some level of
quantity, and of complexity. The HLLs never were any answer to this
purely technical problem.

Mind you
If your
HLL was, as
Stupid
As This
It would, also
Be utterly
Un-viable


Betov.

< http://rosasm.org >
Wolfgang Kern
2007-04-07 12:58:07 UTC
Permalink
Betov in discussion with Herbert:

[about readability]
Post by Betov
Seriously: And if all HLLs have choosen a form, similar to high level
Assembly, where is the miracle? Why did not the HLLs choose forms like
your prefered one, if yours is more readable?
As my background is mainly hardware related, I also prefer
to see all instructions in the order they will be executed.
______
label:
push ...
push ...
call API
______

So I may save on many lines as I can insert conditional
push options if required.

The C-style may be an advantage for programmers who are more
familiar with HLLism than with the CPU-instructions.

Why?
there are more books and schools for HLL than for ASM.

Thanks Rene for having both options in RosAsm.
btw: is there a way to reenable 'db hex ..' compilation ? :)

__
wolfgang
Betov
2007-04-07 15:24:01 UTC
Permalink
Post by Wolfgang Kern
is there a way to reenable 'db hex ..' compilation ? :)
Yes, i will do this for the next release. For now, you have to
declare a label, even if dummy, in front of a "DB" Declaration
(use ',' for multi-lines...). Sorry for the inconvenient, but
this may take "some" time: I am actually stuck with a re-view
of the organization of the Macros-Engine, which is in a sad
state, actually, because it still hosts stupidities of mines,
that i implemented in the earlier SpAsm days.


Betov.

< http://rosasm.org >
Wolfgang Kern
2007-04-08 10:59:59 UTC
Permalink
Post by Betov
Post by Wolfgang Kern
is there a way to reenable 'db hex ..' compilation ? :)
Yes, i will do this for the next release. For now, you have to
declare a label, even if dummy, in front of a "DB" Declaration
(use ',' for multi-lines...). Sorry for the inconvenient, but
this may take "some" time: I am actually stuck with a re-view
of the organization of the Macros-Engine, which is in a sad
state, actually, because it still hosts stupidities of mines,
that i implemented in the earlier SpAsm days.
Ok, not really a problem here as I kept many previous RosAsm's.

You see me too working over my old code parts to better fit with
my new designed structs and the overall size/speed performance.

I can't remember the date, but I created my KEYBD-handler when
the first keyboard with num-pad were available ('some' time ago).
__
wolfgang
/\\\\o//\\annabee
2007-04-08 17:32:26 UTC
Permalink
På Sat, 07 Apr 2007 14:58:07 +0200, skrev Wolfgang Kern
Post by Wolfgang Kern
[about readability]
Post by Betov
Seriously: And if all HLLs have choosen a form, similar to high level
Assembly, where is the miracle? Why did not the HLLs choose forms like
your prefered one, if yours is more readable?
As my background is mainly hardware related, I also prefer
to see all instructions in the order they will be executed.
______
push ...
push ...
call API
______
So I may save on many lines as I can insert conditional
push options if required.
The C-style may be an advantage for programmers who are more
familiar with HLLism than with the CPU-instructions.
Why?
there are more books and schools for HLL than for ASM.
Thanks Rene for having both options in RosAsm.
btw: is there a way to reenable 'db hex ..' compilation ? :)
Agree. But I like better it enabled in a way to avoid the DB.

routine:
057 0f5 0C3

db is a reserved word. when rosasm encounters a number, in code path,
could it not just parse it as hex directly?
Post by Wolfgang Kern
__
wolfgang
Wolfgang Kern
2007-04-09 12:33:28 UTC
Permalink
Wannabee skrev:
[]
Post by /\\\\o//\\annabee
Post by Wolfgang Kern
As my background is mainly hardware related, I also prefer
to see all instructions in the order they will be executed.
______
push ...
push ...
call API
______
So I may save on many lines as I can insert conditional
push options if required.
The C-style may be an advantage for programmers who are more
familiar with HLLism than with the CPU-instructions.
Why?
there are more books and schools for HLL than for ASM.
Thanks Rene for having both options in RosAsm.
btw: is there a way to reenable 'db hex ..' compilation ? :)
Agree. But I like better it enabled in a way to avoid the DB.
057 0f5 0C3
db is a reserved word. when rosasm encounters a number, in code path,
could it not just parse it as hex directly?
This might come in conflict with the other structures ?
It would need a leading label, so my (standard) prefered

cmp al 0a |db 72 02 |add al 7 |add al 30

woldn't work then and will need a rewrite of many source code.

__
wolfgang
/\\\\o//\\annabee
2007-04-09 14:00:15 UTC
Permalink
På Mon, 09 Apr 2007 14:33:28 +0200, skrev Wolfgang Kern
Post by Wolfgang Kern
Post by /\\\\o//\\annabee
Agree. But I like better it enabled in a way to avoid the DB.
057 0f5 0C3
db is a reserved word. when rosasm encounters a number, in code path,
could it not just parse it as hex directly?
This might come in conflict with the other structures ?
It would need a leading label, so my (standard) prefered
cmp al 0a |db 72 02 |add al 7 |add al 30
woldn't work then and will need a rewrite of many source code.
Ok.

cmp al b$Label is six bytes. al cmp al b$reg is 2 bytes
and the instructions are seperated by | in this case as well.
So in this case it could work?

cmp al D$Label
72 02
04 07
add al 30

at this level you hex coders seem to have more fun.
Post by Wolfgang Kern
__
wolfgang
--
Wolfgang Kern
2007-04-09 15:33:12 UTC
Permalink
Post by /\\\\o//\\annabee
Post by Wolfgang Kern
cmp al 0a |db 72 02 |add al 7 |add al 30
woldn't work then and will need a rewrite of many source code.
Ok.
cmp al b$Label is six bytes. al cmp al b$reg is 2 bytes
and the instructions are seperated by | in this case as well.
So in this case it could work?
cmp al D$Label
72 02
04 07
add al 30
at this level you hex coders seem to have more fun.
Yes, but I'd like to have it as a function grouped line:

L0: 3c 0a 72 02 04 07 04 30 ;bin_low_nib_to_ascii_in_AL

But RosAsm don't need this form as I can do it with my HEXEDIT anyway.

translation (not fastest but short):
cmp al,10
jc +2
add al,7
add al,48

__
wolfgang
Frank Kotler
2007-04-09 18:29:22 UTC
Permalink
Wolfgang Kern wrote:

...
[convert nibble to hex-ascii]
Post by Wolfgang Kern
cmp al,10
jc +2
add al,7
add al,48
cmp al, 10
sbb al, 69h
das

Shorter, and eliminates the conditional jump... but "das" is so slow
(how slow *is* it?), I don't think it's a "win"...

What would be your idea of a "fast" way to do it?

Best,
Frank
/\\\\o//\\annabee
2007-04-09 19:32:44 UTC
Permalink
På Mon, 09 Apr 2007 20:29:22 +0200, skrev Frank Kotler
Post by Frank Kotler
...
[convert nibble to hex-ascii]
Post by Wolfgang Kern
cmp al,10
jc +2
add al,7
add al,48
cmp al, 10
sbb al, 69h
das
Shorter, and eliminates the conditional jump... but "das" is so slow
(how slow *is* it?), I don't think it's a "win"...
What would be your idea of a "fast" way to do it?
Arnt you confusing two unrealted posts?
I have a fever do I am not sure.
Post by Frank Kotler
Best,
Frank
--
Frank Kotler
2007-04-09 20:42:57 UTC
Permalink
Post by /\\\\o//\\annabee
På Mon, 09 Apr 2007 20:29:22 +0200, skrev Frank Kotler
Post by Frank Kotler
...
[convert nibble to hex-ascii]
Post by Wolfgang Kern
cmp al,10
jc +2
add al,7
add al,48
cmp al, 10
sbb al, 69h
das
Shorter, and eliminates the conditional jump... but "das" is so slow
(how slow *is* it?), I don't think it's a "win"...
What would be your idea of a "fast" way to do it?
Arnt you confusing two unrealted posts?
Posts are *related* around here??? :)

Yeah, it's quite a diversion from cr88192's assembler. Sorry. But we got
into preferred syntax. I guess Wolfgang would like to write it as:

(24 0F implied) 3C 0A 1C 69 2F

Damned if I know how Herbert says "das"! (sounds like German already!) :)


Just an "even shorter" way to do it. Nothing to do with syntax, really...

Best,
Frank
SpooK
2007-04-10 00:54:19 UTC
Permalink
Post by Frank Kotler
Damned if I know how Herbert says "das"! (sounds like German already!) :)
Just an "even shorter" way to do it. Nothing to do with syntax, really...
Well, watch out... them German instructions are depreciated in long
mode :P
Jim Carlock
2007-04-10 01:06:46 UTC
Permalink
On Apr 9, 3:42 pm, Frank Kotler stated...
: Damned if I know how Herbert says "das"! (sounds like German
: already!) :)

"SpooK" wrote...
: Well, watch out... them German instructions are depreciated in long
: mode :P

So if DAS is the opposite of DAA, and Der is the opposite of Die,
what's the opposite of DAS in Deutsch?
--
Jim Carlock
Frank Kotler
2007-04-10 01:16:21 UTC
Permalink
Post by Jim Carlock
On Apr 9, 3:42 pm, Frank Kotler stated...
: Damned if I know how Herbert says "das"! (sounds like German
: already!) :)
"SpooK" wrote...
: Well, watch out... them German instructions are depreciated in long
: mode :P
So if DAS is the opposite of DAA, and Der is the opposite of Die,
what's the opposite of DAS in Deutsch?
All I know is "Das Boot"... something to do with the first sector on the
drive...

Best,
Frank
Evenbit
2007-04-10 01:56:22 UTC
Permalink
Post by Frank Kotler
Post by Jim Carlock
On Apr 9, 3:42 pm, Frank Kotler stated...
: Damned if I know how Herbert says "das"! (sounds like German
: already!) :)
"SpooK" wrote...
: Well, watch out... them German instructions are depreciated in long
: mode :P
So if DAS is the opposite of DAA, and Der is the opposite of Die,
what's the opposite of DAS in Deutsch?
All I know is "Das Boot"... something to do with the first sector on the
drive...
Yes, it reminds one of that movie ("The Boat" in English... subject of
the film was early German submarines), though I think Windows is the
boat and DOS was the jet-ski. ;-)

Nathan.
SpooK
2007-04-10 05:23:10 UTC
Permalink
Post by Frank Kotler
All I know is "Das Boot"... something to do with the first sector on the
drive...
Wasn't that for KaputOS? What is the status of that OS anyhow??? Last
time I saw, they were only in the design stage.
Herbert Kleebauer
2007-04-10 11:44:13 UTC
Permalink
Post by Frank Kotler
Yeah, it's quite a diversion from cr88192's assembler. Sorry. But we got
(24 0F implied) 3C 0A 1C 69 2F
Damned if I know how Herbert says "das"! (sounds like German already!) :)
What would be a good way to spell "das"? We could call it 0x2f, but
if you don't use this instruction on a daily basis, you maybe can't
remember what the instruction 0x2f does if you see it in your source
code. We could use a "full talking name" like

sub_6_from_each_nibble_if>9_or_if_there_was_a_borrow_for_this_nibble AL

but this is a little bit to long and therefore hard to read in an
assembler program. Now what would be a good compromise? "DAS"?
I don't think so, it isn't really any better than "0x2f". Who
can remember what all this instruction (AAA, AAD, AAM, AAS, DAA, DAS)
do and you also don't know which register is modified.

For me a good compromise is:

adj_asc_add r0
adj_asc_div r0
adj_asc_div #imm8,r0
adj_asc_mul r0
adj_asc_mul #imm8,r0
adj_asc_sub r0
adj_dec_add r0
adj_dec_sub r0

You immediately see which register is modified and if you have once read
the processor manual, it's much easier to remember what this instruction
does compared to a "DAS".

Now explain, which name and syntax you would use if you had
to give this instruction a name. But you have to give a logical
reason why you came to this decision because otherwise it's just a
random selection for which you don't get any points.
Frank Kotler
2007-04-10 19:35:57 UTC
Permalink
Post by Herbert Kleebauer
Post by Frank Kotler
Yeah, it's quite a diversion from cr88192's assembler. Sorry. But we got
(24 0F implied) 3C 0A 1C 69 2F
Damned if I know how Herbert says "das"! (sounds like German already!) :)
What would be a good way to spell "das"? We could call it 0x2f, but
if you don't use this instruction on a daily basis, you maybe can't
remember what the instruction 0x2f does if you see it in your source
code. We could use a "full talking name" like
sub_6_from_each_nibble_if>9_or_if_there_was_a_borrow_for_this_nibble AL
but this is a little bit to long and therefore hard to read in an
assembler program.
Yes, there's a limit to "full talking names".
Post by Herbert Kleebauer
Now what would be a good compromise? "DAS"?
I don't think so, it isn't really any better than "0x2f".
Depends. *Some* people can remember "words" - even "nonsense words" -
easier than numbers...
Post by Herbert Kleebauer
Who
can remember what all this instruction (AAA, AAD, AAM, AAS, DAA, DAS)
do and you also don't know which register is modified.
adj_asc_add r0
adj_asc_div r0
adj_asc_div #imm8,r0
adj_asc_mul r0
adj_asc_mul #imm8,r0
adj_asc_sub r0
adj_dec_add r0
adj_dec_sub r0
You immediately see which register is modified
eax, right? :)

This is good, but might mislead the clueless into thinking they could
use another register. This also applies to your syntax for other
instructions with "implied" registers, "mul", the "string" instructions,
etc. Pretty much gotta RTFM anyway, to see what registers are used/allowed.
Post by Herbert Kleebauer
and if you have once read
the processor manual, it's much easier to remember what this instruction
does compared to a "DAS".
Well, I find it pretty easy to remember that "das" means the same as
"decimal adjust after subtraction". Remembering what it *does*, whether
we call it "das" or "adj_dec_sub", is more of a problem to me. "Adjust",
in itself, is not very enlightening, and "dec" vs "asc" means??? As I
understand these instructions, the "asc" forms work on what I'd call
"unpacked bcd", the "dec" forms on what I'd call "packed bcd".

I suppose if I used this "group" of instructions more frequently, I'd
have 'em sorted out. Since I *very* rarely use 'em, they're in the
"gotta look it up" category.

If you don't use an instruction, you don't have to know what it does.
But if you don't know what it does, you *can't* use it, even where it
would be "appropriate"...
Post by Herbert Kleebauer
Now explain, which name and syntax you would use if you had
to give this instruction a name. But you have to give a logical
reason why you came to this decision because otherwise it's just a
random selection for which you don't get any points.
I'll pass. The parents have already given this baby a name. I don't much
"like" the name, but making up new "more logical" names for things that
already have names is *your* game, not mine! :)

(I make an exception by calling the "direction flag" the "down flag" :)

cmp al,10
jc +2
add al,7
add al,48

We can pretty much see what this does just by looking at it. I despair
of getting such clarity from "das", by whatever name!

Best,
Frank
Betov
2007-04-10 20:36:22 UTC
Permalink
Post by Frank Kotler
Yes, there's a limit to "full talking names".
Indeed: This is "fool talking names".

:)

Betov.

< http://rosasm.org >
Wolfgang Kern
2007-04-10 12:53:24 UTC
Permalink
Post by Frank Kotler
...
[convert nibble to hex-ascii]
Post by Wolfgang Kern
cmp al,10
jc +2
add al,7
add al,48
cmp al, 10
sbb al, 69h
das
Shorter, and eliminates the conditional jump... but "das" is so slow
(how slow *is* it?), I don't think it's a "win"...
DAS latency is reported as 8 cycles in AMD-docs

Intel docs describe what it does:

IF ((AL AND 0FH) > 9) OR (AF = 1)
THEN
AL <- AL - 6;
AH <- AH - 1;
AF <- 1;
CF <- 1;
ELSE
CF <- 0;
AF <- 0;
FI;
AL <- AL AND 0FH;

You see it may alter AH as well, which may spoil the game.

DAS is an invalid instruction in 64-bit mode.
Post by Frank Kotler
What would be your idea of a "fast" way to do it?
IIRC we've seen many variants in the fastest shortes discussion
some time ago in CLAX.

My 8 byte solution (3.5 cycels) wins in the aspect of using
no other registers nor memory. The cc-branch will produce a
penalty if used in a loop (every 9th iteration IIRC).

The short five byte way (10 cycles) and uses AH.

Unfortunately CMOV doesn't have an IMM nor any 8-bit form, so

mov edx,3007h
mov ebx,0
cmp al,0a
CMOV ebx,edx ;replace jc
add al,bl
add al,dh

may not suffer from branch-penalties, but you see how awful...

But single nibble conversion loops will always be slower than
fix-sized 32 or 64 bit solutions like the dw-conversion I use:
______________________________
;eax [bin] to edx:eax [HEX-ascii]:
; it uses only four registers and no memory

xor edx,edx
xor ebx,ebx
; expand nibbles to bytes:
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4

shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4

;copy:
mov ecx,ebx
mov eax,edx

;the algo:
add eax,06060606h
add ecx,06060606h
and eax,10101010h
and ecx,10101010h
shr eax,4
shr ecx,4
imul eax,07h
imul ecx,07h
lea eax,[eax+edx+30303030h]
lea edx,[ecx+ebx+30303030h]

;done but for you perhaps wrong ordered yet, so I add:
bswap eax
bswap edx
_________;end

This needs about 45 cycles (incl BSWAP) on AMD,
but is quite long (128 bytes).

I'm curious how long it takes an Intel for it.

I played around with xmm-code, but I found the overhead with
load/store in memory eats all the advantage with PUNPCKLB,...,POR.
__
wolfgang
Herbert Kleebauer
2007-04-10 12:20:19 UTC
Permalink
Post by Wolfgang Kern
DAS latency is reported as 8 cycles in AMD-docs
IF ((AL AND 0FH) > 9) OR (AF = 1)
THEN
AL <- AL - 6;
AH <- AH - 1;
AF <- 1;
CF <- 1;
ELSE
CF <- 0;
AF <- 0;
FI;
AL <- AL AND 0FH;
You see it may alter AH as well, which may spoil the game.
That's the description of AAS and not DAS (now you see why
that's bad names for these instructions).
Wolfgang Kern
2007-04-10 14:14:50 UTC
Permalink
Post by Herbert Kleebauer
Post by Wolfgang Kern
DAS latency is reported as 8 cycles in AMD-docs
Forget this yet, it is for AAS:
_________
Post by Herbert Kleebauer
Post by Wolfgang Kern
IF ((AL AND 0FH) > 9) OR (AF = 1)
THEN
AL <- AL - 6;
AH <- AH - 1;
AF <- 1;
CF <- 1;
ELSE
CF <- 0;
AF <- 0;
FI;
AL <- AL AND 0FH;
You see it may alter AH as well, which may spoil the game.
_________
Post by Herbert Kleebauer
That's the description of AAS and not DAS
Oops, Sorry my fault. I copied from the wrong page.
I really rare (..never) use these AAS..DAS
Post by Herbert Kleebauer
(now you see why that's bad names for these instructions).
Yes.
It's not easy to find short talking names for complex instructions.
This AAA-group seems to be relicts from anchient three-letter syntax.

this is what DAS does: *** w/o using AH ***

old_AL <- AL;
old_CF <- CF;
CF <- 0;
IF (((AL AND 0FH) > 9) OR AF = 1)
THEN
AL <- AL - 6;
CF <- old_CF OR (Borrow from AL . AL - 6);
AF <- 1;
ELSE
AF <- 0;
FI;
IF ((old_AL > 99H) OR (old_CF = 1))
THEN
AL <- AL - 60H;
CF <- 1;
ELSE
CF <- 0;
FI;

Thanks Herbert,

__
wolfgang
/\\\\o//\\annabee
2007-04-10 12:43:40 UTC
Permalink
På Tue, 10 Apr 2007 14:53:24 +0200, skrev Wolfgang Kern
Post by Wolfgang Kern
Post by Frank Kotler
...
[convert nibble to hex-ascii]
Post by Wolfgang Kern
cmp al,10
jc +2
add al,7
add al,48
cmp al, 10
sbb al, 69h
das
Shorter, and eliminates the conditional jump... but "das" is so slow
(how slow *is* it?), I don't think it's a "win"...
DAS latency is reported as 8 cycles in AMD-docs
IF ((AL AND 0FH) > 9) OR (AF = 1)
THEN
AL <- AL - 6;
AH <- AH - 1;
AF <- 1;
CF <- 1;
ELSE
CF <- 0;
AF <- 0;
FI;
AL <- AL AND 0FH;
You see it may alter AH as well, which may spoil the game.
DAS is an invalid instruction in 64-bit mode.
Post by Frank Kotler
What would be your idea of a "fast" way to do it?
IIRC we've seen many variants in the fastest shortes discussion
some time ago in CLAX.
My 8 byte solution (3.5 cycels) wins in the aspect of using
no other registers nor memory. The cc-branch will produce a
penalty if used in a loop (every 9th iteration IIRC).
The short five byte way (10 cycles) and uses AH.
Unfortunately CMOV doesn't have an IMM nor any 8-bit form, so
mov edx,3007h
mov ebx,0
cmp al,0a
CMOV ebx,edx ;replace jc
add al,bl
add al,dh
may not suffer from branch-penalties, but you see how awful...
But single nibble conversion loops will always be slower than
______________________________
; it uses only four registers and no memory
xor edx,edx
xor ebx,ebx
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4
shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4
mov ecx,ebx
mov eax,edx
add eax,06060606h
add ecx,06060606h
and eax,10101010h
and ecx,10101010h
shr eax,4
shr ecx,4
imul eax,07h
imul ecx,07h
lea eax,[eax+edx+30303030h]
lea edx,[ecx+ebx+30303030h]
bswap eax
bswap edx
_________;end
very funny Wolfgang.
Post by Wolfgang Kern
This needs about 45 cycles (incl BSWAP) on AMD,
but is quite long (128 bytes).
I'm curious how long it takes an Intel for it.
I played around with xmm-code, but I found the overhead with
load/store in memory eats all the advantage with PUNPCKLB,...,POR.
__
wolfgang
--
Frank Kotler
2007-04-10 21:44:37 UTC
Permalink
Post by Wolfgang Kern
Post by Frank Kotler
...
[convert nibble to hex-ascii]
Post by Wolfgang Kern
cmp al,10
jc +2
add al,7
add al,48
cmp al, 10
sbb al, 69h
das
Shorter, and eliminates the conditional jump... but "das" is so slow
(how slow *is* it?), I don't think it's a "win"...
DAS latency is reported as 8 cycles in AMD-docs
Okay. I thought it was even worse than that. Maybe on Intel, it is...
Post by Wolfgang Kern
IF ((AL AND 0FH) > 9) OR (AF = 1)
THEN
AL <- AL - 6;
AH <- AH - 1;
AF <- 1;
CF <- 1;
ELSE
CF <- 0;
AF <- 0;
FI;
AL <- AL AND 0FH;
Apparently this is "aas". Strong evidence that these instructions are
rarely used!
Post by Wolfgang Kern
You see it may alter AH as well, which may spoil the game.
Maybe. Seems likely that ah is "don't care". But that isn't the case
with "das" anyway... I don't think... I really haven't got these
instructions figured out. My understanding is that the "d" forms work on
"pBCD", and the "a" forms on "uBCD" (I think that's how you call them).
I don't know if any of 'em work on actual "ascii characters"... This
might suggest a scheme for "better names", I dunno...
Post by Wolfgang Kern
DAS is an invalid instruction in 64-bit mode.
Perhaps a(nother) reason to avoid it.
Post by Wolfgang Kern
Post by Frank Kotler
What would be your idea of a "fast" way to do it?
IIRC we've seen many variants in the fastest shortes discussion
some time ago in CLAX.
Yeah... Stupid, open-ended question, I guess...
Post by Wolfgang Kern
My 8 byte solution (3.5 cycels) wins in the aspect of using
no other registers nor memory. The cc-branch will produce a
penalty if used in a loop (every 9th iteration IIRC).
9th, eh? Okay...
Post by Wolfgang Kern
The short five byte way (10 cycles) and uses AH.
Or maybe not... But still a mess of cycles.
Post by Wolfgang Kern
Unfortunately CMOV doesn't have an IMM nor any 8-bit form, so
mov edx,3007h
mov ebx,0
cmp al,0a
CMOV ebx,edx ;replace jc
add al,bl
add al,dh
may not suffer from branch-penalties, but you see how awful...
Mmmmm... not *that* awful... Wouldn't have run on my recently-deceased
K6... (but it's "dead" - in a meaningful sense! :)
Post by Wolfgang Kern
But single nibble conversion loops will always be slower than
______________________________
; it uses only four registers and no memory
xor edx,edx
xor ebx,ebx
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4
shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4
mov ecx,ebx
mov eax,edx
add eax,06060606h
add ecx,06060606h
and eax,10101010h
and ecx,10101010h
shr eax,4
shr ecx,4
imul eax,07h
imul ecx,07h
lea eax,[eax+edx+30303030h]
lea edx,[ecx+ebx+30303030h]
bswap eax
bswap edx
_________;end
This needs about 45 cycles (incl BSWAP) on AMD,
but is quite long (128 bytes).
I'm curious how long it takes an Intel for it.
P4, in particular, has a reputation of being "really bad" on shifts. I
think of myself as an "AMD guy", but I'm running a P4 right now. I
haven't done any "timing" on it - haven't even confirmed the weird
results Herbert reported. I'll try to "get to it" (if the spirit moves
me). I have an idea it won't be good. May need a conditional jump - "if
Intel, call the other function"...
Post by Wolfgang Kern
I played around with xmm-code, but I found the overhead with
load/store in memory eats all the advantage with PUNPCKLB,...,POR.
Xmm is still on my "learn someday" list. This seems a common story,
though. Apparently, xmm is a (big?) win in certain situations where it's
"appropriate", but if you need to "force" your application into it,
(much?) worse.

Of course, the most likely reason to convert nibbles to hex ascii is
"human convenience", and the human can't read 'em nearly as fast as our
*slowest* method, so... Still, maybe some other process could use the
cycles...

Randy values "clarity". In some cases, I don't think the "clarity" is
worth the "detour". In *this* case, the "clarity" of the "obvious"
method is probably well worth the three bytes! Sorry I even mentioned
"das" (no I'm not - it's fun to discuss this stuff! :)

Best,
Frank
/\\\\o//\\annabee
2007-04-11 00:34:23 UTC
Permalink
På Tue, 10 Apr 2007 23:44:37 +0200, skrev Frank Kotler
Post by Frank Kotler
Post by Wolfgang Kern
This needs about 45 cycles (incl BSWAP) on AMD,
but is quite long (128 bytes).
I'm curious how long it takes an Intel for it.
P4, in particular, has a reputation of being "really bad" on shifts. I
think of myself as an "AMD guy", but I'm running a P4 right now. I
haven't done any "timing" on it - haven't even confirmed the weird
results Herbert reported. I'll try to "get to it" (if the spirit moves
me). I have an idea it won't be good. May need a conditional jump - "if
Intel, call the other function"...
Rosasm hexprint is 5 times faster then Wolgangs code :) ?

I clocked wolfgang at between 666 and 777 cycles and variations (earlier
today)

(>800) now.

Hexprint at somewhat above 100 cycles. 145 or thereabouts.

i called hexprint like this:

Betov_Hex:
mov ebx eax
mov ecx 8, edi HexPrintString | add edi 7
std
Do
mov al bl | and al 0F | add al '0'
On al > '9', add al 7
stosb | shr ebx 4
Do_Loop
cld
ret

this adress memory and etc.

Now for the disturbing news. (to me at least)
If I put wolfgangs code, in front of my testcode, a few bytes ahead, it
clocks 272 cycles, but if I place it in another TITLE, many many many
bytes lower adress, then it clocks in at 800+ cycles.

If I do the same with Betov_Hex I get 598 cycles if I place it at the very
much lower adress, and 145 cycles if imidiatly ahead in the code.

I guess this is because of cache?

Anyways, the Betov hexprint is :) faster.
And that one I can read and understand and reuse in two seconds,
whereas Wolfgangs I had to step in the debugger several times,
and I am not sure I get it anyway.

the same thing happens when I place it at lower adresses, just before the
testcode (Post code below), but to a lesser degree. I now get 602 cycles
for Wolfgangs code
and 374 for Betovs hexprint.


Below is the complete code used in the timings, except for the GUI code.
This code is run in USER mode realtime priority, and runs as the result of
clicking a menuitem:


First listed is the two routines at lower adresses.
Then the testroutine
then the same two routines at higher adresses.

For the 800+ cycles rememeber they use _much_ lower adresses.


Betov_Hex2:
mov ebx eax
mov ecx 8, edi HexPrintString | add edi 7
std
Do
mov al bl | and al 0F | add al '0'
On al > '9', add al 7
stosb | shr ebx 4
Do_Loop
cld
ret

WolfGang_BinToAscci2:
______________________________
;eax [bin] to edx:eax [HEX-ascii]:
; it uses only four registers and no memory
xor edx,edx
xor ebx,ebx
; expand nibbles to bytes:
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4
shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4
;copy:
mov ecx,ebx
mov eax,edx
;the algo:
add eax,06060606h
add ecx,06060606h
and eax,10101010h
and ecx,10101010h
shr eax,4
shr ecx,4
imul eax,07h
imul ecx,07h
lea eax,D$eax+edx+30303030h
lea edx,D$ecx+ebx+30303030h
;done but for you perhaps wrong ordered yet, so I add:
bswap eax
bswap edx
ret

;;
This is the test/timing code
;;

[TestVariable: ? ? ?]
TestCode:
push edi
CPUID | rdtsc | push eax edx
mov eax 0-1
;call WolfGang_BinToAscci
;call WolfGang_BinToAscci2
call Betov_Hex
;call Betov_Hex2
rdtsc | pop ecx ebx
sub eax ebx
sbb edx ecx
int 3
pop edi
ret

Betov_Hex:
mov ebx eax
mov ecx 8, edi HexPrintString | add edi 7
std
Do
mov al bl | and al 0F | add al '0'
On al > '9', add al 7
stosb | shr ebx 4
Do_Loop
cld
ret

WolfGang_BinToAscci:
______________________________
;eax [bin] to edx:eax [HEX-ascii]:
; it uses only four registers and no memory
xor edx,edx
xor ebx,ebx
; expand nibbles to bytes:
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4
shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4
;copy:
mov ecx,ebx
mov eax,edx
;the algo:
add eax,06060606h
add ecx,06060606h
and eax,10101010h
and ecx,10101010h
shr eax,4
shr ecx,4
imul eax,07h
imul ecx,07h
lea eax,D$eax+edx+30303030h
lea edx,D$ecx+ebx+30303030h
;done but for you perhaps wrong ordered yet, so I add:
bswap eax
bswap edx
ret
Post by Frank Kotler
Best,
Frank
--
Betov
2007-04-11 06:42:20 UTC
Permalink
Post by /\\\\o//\\annabee
Rosasm hexprint is 5 times faster
I never write anything for being fast at code level,
and I have absolutely no interrest with this.

;)

Betov.

< http://rosasm.org >
Wolfgang Kern
2007-04-11 15:31:07 UTC
Permalink
Post by /\\\\o//\\annabee
På Tue, 10 Apr 2007 23:44:37 +0200, skrev Frank Kotler
Post by Frank Kotler
Post by Wolfgang Kern
This needs about 45 cycles (incl BSWAP) on AMD,
but is quite long (128 bytes).
I'm curious how long it takes an Intel for it.
P4, in particular, has a reputation of being "really bad" on shifts. I
think of myself as an "AMD guy", but I'm running a P4 right now. I
haven't done any "timing" on it - haven't even confirmed the weird
results Herbert reported. I'll try to "get to it" (if the spirit moves
me). I have an idea it won't be good. May need a conditional jump - "if
Intel, call the other function"...
Rosasm hexprint is 5 times faster then Wolgangs code :) ?
I clocked wolfgang at between 666 and 777 cycles and variations (earlier
today)
(>800) now.
Hexprint at somewhat above 100 cycles. 145 or thereabouts.
And I tried this a few minutes ago:
___________________________________
[STDH: 0]
[Time: 0 0]
[HexPrintString: B$ ' ']

main:
_____
;cli ;wont do any good on NT
CPUID |RDTSC |mov D$time eax |mov D$time+4 edx

____________;TEST-AREA insert your code under test here:
;best avoid calls in here or
Betov_Hex2:
mov eax 012345678
mov ebx eax
mov ecx 8 |mov edi HexPrintString | add edi 7
std
L0: mov al bl | and al 0F | add al 030
cmp al 03a | jc L1> |add al 7
L1: stosb | shr ebx 4
Loop L0<
cld
___________
push edx |push eax
RDTSC |sub eax D$time |sbb edx D$time+4 |mov D$time eax |mov D$time+4 edx
pop eax |pop edx
;sti
___________
int3
push 0 |jmp 'KERNEL32.ExitProcess'
______________________________________

This needs reproducable 124 cycles here.

Looks like you just measure windoze background noise.
Post by /\\\\o//\\annabee
Now for the disturbing news. (to me at least)
If I put wolfgangs code, in front of my testcode, a few bytes ahead, it
clocks 272 cycles, but if I place it in another TITLE, many many many
bytes lower adress, then it clocks in at 800+ cycles.
First (new) caches and misalignment may spoil the test.

I tested also your way with 'calling' the routines,
and surprise surprise I also got weird results from 250 to 10000 cycles.
This are typical stack fetch penalties (and/or page-fault recovery)

So I added in front of the first RDTSC:
_________
[SDTH: 0]
push 0-11 |call 'KERNEL32.GetStdHandle' |mov D$StdH eax
pushad
popad
_________
just to have some stack already 'as used'

A more reliable comparision is always the direct check of
code parts by reducing windoze noise to a minimum.
Post by /\\\\o//\\annabee
If I do the same with Betov_Hex I get 598 cycles if I place it at the
very much lower adress, and 145 cycles if imidiatly ahead in the code.
I guess this is because of cache?
Yes.
Post by /\\\\o//\\annabee
Anyways, the Betov hexprint is :) faster.
No, this STOSB-loop takes 124 cycles (136 with call)
My solution need 45 cycles (58 with call)
Post by /\\\\o//\\annabee
And that one I can read and understand and reuse in two seconds,
whereas Wolfgangs I had to step in the debugger several times,
and I am not sure I get it anyway.
:)
the algo is easy (done for all 8 bytes):
add 06 ;the upper four bits are clear after the expansion anyway
and 010 ;this bit is set "if >0a"
shr 4 ;make this bit to bit0
mul 7 ;now we get either zero or seven
add ;previous saved + "0 or 7" + '30'
Post by /\\\\o//\\annabee
the same thing happens when I place it at lower adresses, just before the
testcode (Post code below), but to a lesser degree. I now get 602 cycles
for Wolfgangs code
and 374 for Betovs hexprint.
As above. Aviod noise measurement ;)

__
wolfgang
Wolfgang Kern
2007-04-11 16:25:29 UTC
Permalink
Hi Frank,
Post by Frank Kotler
Post by Wolfgang Kern
Post by Frank Kotler
[convert nibble to hex-ascii]
Post by Wolfgang Kern
cmp al,10
jc +2
add al,7
add al,48
cmp al, 10
sbb al, 69h
das
DAS latency is reported as 8 cycles in AMD-docs
Okay. I thought it was even worse than that. Maybe on Intel, it is...
[...]
Post by Frank Kotler
Apparently this is "aas". Strong evidence that these instructions are
rarely used!
Yes, Sorry.

[...]
Post by Frank Kotler
Post by Wolfgang Kern
mov edx,3007h
mov ebx,0
cmp al,0a
CMOV ebx,edx ;replace jc
add al,bl
add al,dh
may not suffer from branch-penalties, but you see how awful...
Mmmmm... not *that* awful...
Due dependcies it will need full 6 cycles..
Post by Frank Kotler
Wouldn't have run on my recently-deceased
K6... (but it's "dead" - in a meaningful sense! :)
I bought five old K6,K7 mainboards (for 10U$ alltogether) to have
the kids play around with it just last week.
Three of em still work fine.

[code snipped]
Post by Frank Kotler
Post by Wolfgang Kern
This needs about 45 cycles (incl BSWAP) on AMD,
but is quite long (128 bytes).
I'm curious how long it takes an Intel for it.
P4, in particular, has a reputation of being "really bad" on shifts. I
think of myself as an "AMD guy", but I'm running a P4 right now. I
haven't done any "timing" on it - haven't even confirmed the weird
results Herbert reported. I'll try to "get to it" (if the spirit moves
me). I have an idea it won't be good. May need a conditional jump - "if
Intel, call the other function"...
Perhaps a compiler switch would be better... :)
Post by Frank Kotler
Post by Wolfgang Kern
I played around with xmm-code, but I found the overhead with
load/store in memory eats all the advantage with PUNPCKLB,...,POR.
Xmm is still on my "learn someday" list. This seems a common story,
though. Apparently, xmm is a (big?) win in certain situations where it's
"appropriate", but if you need to "force" your application into it,
(much?) worse.
My new standard is AMD64 on MSI, but I still haven't used SSE/3Dnow!
routines in my OS-code. But yet I think about to make more use of it
even I don't support nor use the weird IEEE-754 2^n-exponent formats.
A few SSE instructions got really power also for my integer forms.
Post by Frank Kotler
Of course, the most likely reason to convert nibbles to hex ascii is
"human convenience", and the human can't read 'em nearly as fast as our
*slowest* method, so... Still, maybe some other process could use the
cycles...
I'm familiar with it from my NOVA-2 days, but it would be a PITA
for most coders to view code-bytes/address on bit-indicator LEDs.

The invention of HEX made the story somehow more convenient.
Post by Frank Kotler
Randy values "clarity". In some cases, I don't think the "clarity" is
worth the "detour". In *this* case, the "clarity" of the "obvious"
method is probably well worth the three bytes! Sorry I even mentioned
"das" (no I'm not - it's fun to discuss this stuff! :)
Right, we better have some fun beside the daily programming job.

__
wolfgang
Wolfgang Kern
2007-04-06 16:44:25 UTC
Permalink
Herbert Kleebauer wrote:

[...]
Post by Herbert Kleebauer
Post by Betov
What are you talking about? Rejecting the Multi-Instructions
lines concept? If yes... well... write it the way you'd like...
m'I gniklat tuoba gnitirw secnetnes morf tfel ot thgir tub sdrow
morf thgir ot tfel.
:)

But it may depend on who spelt the sentence:

1. Scientists: x=a+b
or
2. groundschool kids: 1+1=2

:)

__
wolfgang
Herbert Kleebauer
2007-04-06 16:51:38 UTC
Permalink
a=a+b a is the result of a+b
add b,a add b to a
or
add a,b add to a b
Wolfgang Kern
2007-04-07 13:45:15 UTC
Permalink
Post by Herbert Kleebauer
a=a+b a is the result of a+b
add b,a add b to a
or
add a,b add to a b
Yes. :)

A few better 'how to spell' examples:

[LEA,sib]
1. a=b+c*f+d
2. b+c*f+d=a

[iMUL]
1. a=b*c
2. 1*1=1 ... reminds this ? :)

[MOV] (I previously used LD and ST instead, now I use Dest=Src)
1. a=b ;load a with b (even old BASIC said: LET a=b)
2. move b to a ;OK, even b does't 'move' at all.

[MOVZX/SX]
1. eax=Zb[mem]
2. store 0 to eax |store b[mem] to al ???

I prefer the scientific notation f(x)=... ;ADD x,...
with the result(destination) at the left side.

Finally it's just a matter of familiarity anyway.
Your first steps may have been with Motorola's,
mine were with Zilog's and Intel's. That's all about it :)

__
wolfgang
cr88192
2007-04-06 10:33:18 UTC
Permalink
Post by Herbert Kleebauer
Post by Betov
I have to second Spook, here. ';' has been the traditional char
for comments, and there is no reason for modifying this. Also,
in order to resist to the natural babelism tendancies, it would
be a good thing to do the same choices, the other Author have
done, when this is without any importance.
The only way to get an improvement is to not make the same
choices as the other have done.
ok.
Post by Herbert Kleebauer
Post by Betov
The only Assembler
i know about, which enables with multi-Instructions-lines is
mov eax ebx | mov edx 0 | div ecx
Now, that surely is an example how a deviation from the
commonly accepted choices makes thing worse: to separate
the parameters by a "," makes the instruction much better
readable. And the "|" as a separator for instructions is
much to dominant. Compare the two lines and you will
see what's the better choice.
mov eax ebx | mov edx eax | mov ecx edx | div ecx
mov eax,ebx; mov edx,eax; mov ecx,edx; div ecx;
I don't know how problematic it is, to use the ; as an
instruction separator and as a start of a comment depending
on a preceding space, but I must say, I most probably
never used a ";" as a start of comment without at least
one space before it.
I agree, this is partly why I did it this way.
Post by Herbert Kleebauer
But much superior to both is the traditional way to only
write one instruction per line and use the src,dest
move.l r3,r0
move.l r0,r1
move.l r1,r2
divu.l r2,r1|r0
r3 -> r0 -> r1 -> r2
I like dst,src personally, but then again this is what I have (mostly)
always seen.


gas does src,dst though.

actually, the main complaints I have about gas syntax are this:
-8(%ebp)
3040(,%ebx,4)

couldn't they have made it much of anything less horrible?...
Post by Herbert Kleebauer
But with your format, you have to look even more than twice
mov eax ebx | mov edx eax | mov ecx edx
eax <- ebx
+--------------------+
edx <- eax-+
+--------------------+
ecx <- edx-+
Post by Betov
Unless it would be a problem for your parser to make a difference
with an expression, why not doing the same choice?
No, if the old choices are good, keep them. If the old choices are
bad, change them. But you did exactly the opposite: change
the good and keep the bad. But it really doesn't matter as
long as your product is only for your private use (like an
assembler for writing applications).
yes.


as noted elsewhere, I started this effort ignoring human code, but more
recently I have started to consider it.

initially, this started as a bunch of code-generation utility functions,
which were called directly to emit specific opcodes or constructions
(reasoning that it was too damn tedious to output customized machine code
for everything).

later on, these utility functions were themselves inconvininient, so I beat
together the parser (which was, and still is, an ugly mass of tokenizing).
initially this served to wrap the various utility functions.

(this was around late January, the previous state had existed since maybe
about last August...).

after this, the new interface improved productivity enough that I actually
got the JIT compiler working (pretty much completely displacing the old
function-call driven interface), which was cool for a while (but then I
noted that I still had many of the same limitations as before, ie, when
everything was interpreted).


as for using it for human-written code:

actually, one of the first things that prompted this was this:
"wow, I have this assembler I use for JIT output, why not hack over some of
the upper level JIT mechanics and use it for assembling code directly?...".

the original idea was that it could be useful, for example, in the
implementation of vertex shaders and similar features. more ambitious ideas
followed, such as allowing assembler to link directly against the host app,
...


I then did this, and this started prompting ideas that followed:
cleaning up some of the cruftyness;
eventually spliting it off into its own library (basically, originally this
code was all mashed together with code for translating bytecode into machine
code).

in the process, I ripped the JIT mechanics out of the assembler, and
likewise the assember mechanics out of the JIT. I also stripped out many of
the since no-longer used functions (a large mass of functions for generating
conditional jumps, ...).


so, now, the assembler focuses on assembly, and the JIT compiler(s)
interface with the assembler through an increasingly abstract API.

the next goal may include getting a working C compiler (note: in a different
library).

I already have a new JIT frontend, which implements a kind of vaguely
forth-like stack language (it also has a parser, but this is unlikely to be
used by the compiler, which will more likely directly emit bytecode).

or such...
¬a\\/b
2007-04-08 06:07:12 UTC
Permalink
Post by Herbert Kleebauer
Post by Betov
I have to second Spook, here. ';' has been the traditional char
for comments, and there is no reason for modifying this. Also,
in order to resist to the natural babelism tendancies, it would
be a good thing to do the same choices, the other Author have
done, when this is without any importance.
The only way to get an improvement is to not make the same
choices as the other have done.
Post by Betov
The only Assembler
i know about, which enables with multi-Instructions-lines is
mov eax ebx | mov edx 0 | div ecx
Now, that surely is an example how a deviation from the
commonly accepted choices makes thing worse: to separate
the parameters by a "," makes the instruction much better
readable. And the "|" as a separator for instructions is
much to dominant. Compare the two lines and you will
see what's the better choice.
mov eax ebx | mov edx eax | mov ecx edx | div ecx
mov eax,ebx; mov edx,eax; mov ecx,edx; div ecx;
I don't know how problematic it is, to use the ; as an
instruction separator and as a start of a comment depending
on a preceding space, but I must say, I most probably
never used a ";" as a start of comment without at least
one space before it.
But much superior to both is the traditional way to only
write one instruction per line and use the src,dest
move.l r3,r0
move.l r0,r1
move.l r1,r2
divu.l r2,r1|r0
r3 -> r0 -> r1 -> r2
But with your format, you have to look even more than twice
mov eax ebx | mov edx eax | mov ecx edx
eax <- ebx
+--------------------+
edx <- eax-+
+--------------------+
ecx <- edx-+
Post by Betov
Unless it would be a problem for your parser to make a difference
with an expression, why not doing the same choice?
No, if the old choices are good, keep them. If the old choices are
bad, change them. But you did exactly the opposite: change
the good and keep the bad. But it really doesn't matter as
long as your product is only for your private use (like an
assembler for writing applications).
my choices seems goods and right.

1) Indentation and redution of len of each instruction

in my way of hobby assembly code
there is more chance that the wrong in write the C[++] code
than assembly

a little example:

section _DATA public align=4 class=DATA use32
global _StrToNum

/* 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
tab dd 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x36ff /* 0
dd 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
/* 10
dd 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
/* 20
dd 0xff, 0xff,0x37ff,0x1ff, 0xff,
0xff,0x33ff,0x35ff,0xff,0x34ff /* 30
dd 0x31ff,0xff, 0xff, 0xff, 0xff, 0x3ff, 0xff, 0xff, 0x0, 0x1
/* 40
dd 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0x9, 0xff, 0xff
/* 50
dd 0xff, 0xff, 0xff, 0xff, 0xff, 0xa, 0xb, 0xc, 0xd, 0xe
/* 60
dd 0xf, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18
/* 70
dd 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, 0x20, 0x21, 0x22
/* 80
dd 0x23, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xa, 0xb, 0xc
/* 90
dd 0xd, 0xe, 0xf, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16
/* 100
dd 0x17, 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, 0x20
/* 110
dd 0x21, 0x22, 0x23, 0xff, 0xff, 0xff, 0x2ff
/* 120
times 132 dd 0xff

section _TEXT public align=1 class=CODE use32

/* num:
/* unsigned32 len_of_num
/* unsigned32* array_num
/* unsigned32 len_of_memory_of_num
/*
/* int
/* StrToNum(num[*|&] res, char* string, char** pos, int base)
/* eg. num res;
/* char buf[256]={0}, *pc; int val=StrToUns( &res, buf, &pc, 10);
/* CF==carry flag
/* format==<spaces><+><digits> where spaces={' ', '\t'}
/* convert the number found in a 0 terminated string
/* "string" in the base digit format to a big number with
/* 2<=base<=36; possible digits are "0123456789[A-Z][a-z]"
/* if there is not a number in the above format then
/* after the call => (string==pos CF==1 and *res=0)
/* if overflow [mem number not enought] return 0 CF==1
/* and *res=0 if there is (or not) overflow it read all the number
/* and save in "pos" the first not digit position
/* if all is ok CF==0 and return 1, if error CF==1, *res=0, and
/* return 0
/* 0k, 4j, 8i, 12r, 16c, 20b, 24ra, 28res, 32string, 36pos, 40base
_StrToNum:
< b, c, r, i, j, k
<< @res=[s+28], @string=[s+32], @pos=[s+36], @base=[s+40]
i=@string; j=@res; j==0#.ce;
D[j+4]==0#.ce; D[j+8]<1#.ce;
i==0#.ce; b^=b;
k=@base; k>36#.ce; k<=1#.ce;
#.c1;
.c0: {++i; .c1: B*i==' '#.c0; B*i==9#.c0;}
B*i=='+'!#.c2 | ++i;
.c2: bl=*i; [tab+4*b]>=k#.ce;
#.c5;
.ce: a=@pos; r=@string; *a=r; a=0; ##.ca;
.c4: {++i; .c5: B*i=='0'#.c4; }
a=0;
D*j=1; k=[j+4]; D*k=0; c=1;
.c6: b^=b; bl=*i; b=[tab+4*b];
b>= D @base#.c8
.a0: a=*k; mul D @base; /* moltiplicazione *10
a+=b; r++=0; jc .c7; b=r;
*k=a; k+=4; --c#.a0;
r!=0!#.a1 /* here the array grow of 1
a=*j; D[j+8]<=a#.c7;
++D*j; *k=r;
.a1: c=*j; k=[j+4];
++i; #.c6;
.c7: D*j=1; k=[j+4]; D*k=0; b^=b;
.a2: {++i; bl=*i; r=[tab+4*b]; r< D @base#.a2;}
r=1; #.c9;
.c8: r=0;
.c9: b=@pos; *b=i;
r==0!#.ca; a=1; clc; #.cf
.ca: a=0; stc;
Post by Herbert Kleebauer
Post by Betov
@res, @string, @pos, @base
b, c, r, i, j, k
ret

is not that beautifull? :)
have i use more comment?
Wolfgang Kern
2007-04-08 11:21:48 UTC
Permalink
Hello "¬a\/b",

[..]
Post by ¬a\\/b
my choices seems goods and right.
1) Indentation and redution of len of each instruction
in my way of hobby assembly code
there is more chance that the wrong in write the C[++] code
than assembly
section _DATA public align=4 class=DATA use32
global _StrToNum
/* 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
tab dd 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x36ff /* 0
dd 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
/* 10
dd 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
/* 20
dd 0xff, 0xff,0x37ff,0x1ff, 0xff,
0xff,0x33ff,0x35ff,0xff,0x34ff /* 30
dd 0x31ff,0xff, 0xff, 0xff, 0xff, 0x3ff, 0xff, 0xff, 0x0, 0x1
/* 40
dd 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0x9, 0xff, 0xff
/* 50
dd 0xff, 0xff, 0xff, 0xff, 0xff, 0xa, 0xb, 0xc, 0xd, 0xe
/* 60
dd 0xf, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18
/* 70
dd 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, 0x20, 0x21, 0x22
/* 80
dd 0x23, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xa, 0xb, 0xc
/* 90
dd 0xd, 0xe, 0xf, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16
/* 100
dd 0x17, 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, 0x20
/* 110
dd 0x21, 0x22, 0x23, 0xff, 0xff, 0xff, 0x2ff
/* 120
times 132 dd 0xff
section _TEXT public align=1 class=CODE use32
/* unsigned32 len_of_num
/* unsigned32* array_num
/* unsigned32 len_of_memory_of_num
/*
/* int
/* StrToNum(num[*|&] res, char* string, char** pos, int base)
/* eg. num res;
/* char buf[256]={0}, *pc; int val=StrToUns( &res, buf, &pc, 10);
/* CF==carry flag
/* format==<spaces><+><digits> where spaces={' ', '\t'}
/* convert the number found in a 0 terminated string
/* "string" in the base digit format to a big number with
/* 2<=base<=36; possible digits are "0123456789[A-Z][a-z]"
/* if there is not a number in the above format then
/* after the call => (string==pos CF==1 and *res=0)
/* if overflow [mem number not enought] return 0 CF==1
/* and *res=0 if there is (or not) overflow it read all the number
/* and save in "pos" the first not digit position
/* if all is ok CF==0 and return 1, if error CF==1, *res=0, and
/* return 0
/* 0k, 4j, 8i, 12r, 16c, 20b, 24ra, 28res, 32string, 36pos, 40base
< b, c, r, i, j, k
D[j+4]==0#.ce; D[j+8]<1#.ce;
i==0#.ce; b^=b;
#.c1;
.c0: {++i; .c1: B*i==' '#.c0; B*i==9#.c0;}
B*i=='+'!#.c2 | ++i;
.c2: bl=*i; [tab+4*b]>=k#.ce;
#.c5;
.c4: {++i; .c5: B*i=='0'#.c4; }
a=0;
D*j=1; k=[j+4]; D*k=0; c=1;
.c6: b^=b; bl=*i; b=[tab+4*b];
a+=b; r++=0; jc .c7; b=r;
*k=a; k+=4; --c#.a0;
r!=0!#.a1 /* here the array grow of 1
a=*j; D[j+8]<=a#.c7;
++D*j; *k=r;
.a1: c=*j; k=[j+4];
++i; #.c6;
.c7: D*j=1; k=[j+4]; D*k=0; b^=b;
r=1; #.c9;
.c8: r=0;
r==0!#.ca; a=1; clc; #.cf
.ca: a=0; stc;
Post by ¬a\\/b
Post by Betov
@res, @string, @pos, @base
b, c, r, i, j, k
ret
is not that beautifull? :)
If you like an answer from me then you better post a traditional
disassembly of it, or support us with a link to the resulting binary.

If I assume the eleven @-Items as memory-references then your
solution may not be very fast, even your filter-table may help a bit.
Post by ¬a\\/b
have i use more comment?
One would have to learn your way of using this syntax,
for me it looks (as Beth used to say) like Alphabet-Soup :)

__
wolfgang
¬a\\/b
2007-04-09 05:22:48 UTC
Permalink
Post by Wolfgang Kern
is not that beautifull? :)>
If you like an answer from me then you better post a traditional
disassembly of it, or support us with a link to the resulting binary.
already posted it in another thread here: not remember where
Post by Wolfgang Kern
solution may not be very fast, even your filter-table may help a bit.
have i use more comment?
One would have to learn your way of using this syntax,
for me it looks (as Beth used to say) like Alphabet-Soup :)
/\\\\o//\\annabee
2007-04-08 21:40:55 UTC
Permalink
Post by ¬a\\/b
Post by Herbert Kleebauer
Post by Betov
I have to second Spook, here. ';' has been the traditional char
for comments, and there is no reason for modifying this. Also,
in order to resist to the natural babelism tendancies, it would
be a good thing to do the same choices, the other Author have
done, when this is without any importance.
The only way to get an improvement is to not make the same
choices as the other have done.
Post by Betov
The only Assembler
i know about, which enables with multi-Instructions-lines is
mov eax ebx | mov edx 0 | div ecx
Now, that surely is an example how a deviation from the
commonly accepted choices makes thing worse: to separate
the parameters by a "," makes the instruction much better
readable. And the "|" as a separator for instructions is
much to dominant. Compare the two lines and you will
see what's the better choice.
mov eax ebx | mov edx eax | mov ecx edx | div ecx
mov eax,ebx; mov edx,eax; mov ecx,edx; div ecx;
I don't know how problematic it is, to use the ; as an
instruction separator and as a start of a comment depending
on a preceding space, but I must say, I most probably
never used a ";" as a start of comment without at least
one space before it.
But much superior to both is the traditional way to only
write one instruction per line and use the src,dest
move.l r3,r0
move.l r0,r1
move.l r1,r2
divu.l r2,r1|r0
r3 -> r0 -> r1 -> r2
But with your format, you have to look even more than twice
mov eax ebx | mov edx eax | mov ecx edx
eax <- ebx
+--------------------+
edx <- eax-+
+--------------------+
ecx <- edx-+
Post by Betov
Unless it would be a problem for your parser to make a difference
with an expression, why not doing the same choice?
No, if the old choices are good, keep them. If the old choices are
bad, change them. But you did exactly the opposite: change
the good and keep the bad. But it really doesn't matter as
long as your product is only for your private use (like an
assembler for writing applications).
my choices seems goods and right.
1) Indentation and redution of len of each instruction
in my way of hobby assembly code
there is more chance that the wrong in write the C[++] code
than assembly
section _DATA public align=4 class=DATA use32
global _StrToNum
/* 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
tab dd 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x36ff /* 0
dd 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
/* 10
dd 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
/* 20
dd 0xff, 0xff,0x37ff,0x1ff, 0xff,
0xff,0x33ff,0x35ff,0xff,0x34ff /* 30
dd 0x31ff,0xff, 0xff, 0xff, 0xff, 0x3ff, 0xff, 0xff, 0x0, 0x1
/* 40
dd 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0x9, 0xff, 0xff
/* 50
dd 0xff, 0xff, 0xff, 0xff, 0xff, 0xa, 0xb, 0xc, 0xd, 0xe
/* 60
dd 0xf, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18
/* 70
dd 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, 0x20, 0x21, 0x22
/* 80
dd 0x23, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xa, 0xb, 0xc
/* 90
dd 0xd, 0xe, 0xf, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16
/* 100
dd 0x17, 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, 0x20
/* 110
dd 0x21, 0x22, 0x23, 0xff, 0xff, 0xff, 0x2ff
/* 120
times 132 dd 0xff
section _TEXT public align=1 class=CODE use32
/* unsigned32 len_of_num
/* unsigned32* array_num
/* unsigned32 len_of_memory_of_num
/*
/* int
/* StrToNum(num[*|&] res, char* string, char** pos, int base)
/* eg. num res;
/* char buf[256]={0}, *pc; int val=StrToUns( &res, buf, &pc, 10);
/* CF==carry flag
/* format==<spaces><+><digits> where spaces={' ', '\t'}
/* convert the number found in a 0 terminated string
/* "string" in the base digit format to a big number with
/* 2<=base<=36; possible digits are "0123456789[A-Z][a-z]"
/* if there is not a number in the above format then
/* after the call => (string==pos CF==1 and *res=0)
/* if overflow [mem number not enought] return 0 CF==1
/* and *res=0 if there is (or not) overflow it read all the number
/* and save in "pos" the first not digit position
/* if all is ok CF==0 and return 1, if error CF==1, *res=0, and
/* return 0
/* 0k, 4j, 8i, 12r, 16c, 20b, 24ra, 28res, 32string, 36pos, 40base
< b, c, r, i, j, k
D[j+4]==0#.ce; D[j+8]<1#.ce;
i==0#.ce; b^=b;
#.c1;
.c0: {++i; .c1: B*i==' '#.c0; B*i==9#.c0;}
B*i=='+'!#.c2 | ++i;
.c2: bl=*i; [tab+4*b]>=k#.ce;
#.c5;
.c4: {++i; .c5: B*i=='0'#.c4; }
a=0;
D*j=1; k=[j+4]; D*k=0; c=1;
.c6: b^=b; bl=*i; b=[tab+4*b];
a+=b; r++=0; jc .c7; b=r;
*k=a; k+=4; --c#.a0;
r!=0!#.a1 /* here the array grow of 1
a=*j; D[j+8]<=a#.c7;
++D*j; *k=r;
.a1: c=*j; k=[j+4];
++i; #.c6;
.c7: D*j=1; k=[j+4]; D*k=0; b^=b;
r=1; #.c9;
.c8: r=0;
r==0!#.ca; a=1; clc; #.cf
.ca: a=0; stc;
Post by Herbert Kleebauer
Post by Betov
@res, @string, @pos, @base
b, c, r, i, j, k
ret
is not that beautifull? :)
have i use more comment?
:))) :))) Yes. Its awsome.
Evenbit
2007-04-06 01:47:16 UTC
Permalink
Post by cr88192
for my own projects I have written an assembler (mostly since january).
If you wish that we drag it in here for use as a punching bag, then
give us a download link. ;)

Nathan.
cr88192
2007-04-06 03:43:30 UTC
Permalink
Post by Evenbit
Post by cr88192
for my own projects I have written an assembler (mostly since january).
If you wish that we drag it in here for use as a punching bag, then
give us a download link. ;)
as noted, it is not online.
actually, right now, I don't even really have a server to put it on
(sourceforge account isn't working afaict, maybe I was gone too long).

tried setting up a server on my main computer (this is why I hadn't
responded yet):
http://cr88192.dyndns.org:8080/bgbasm.zip

if it works (probably wont, can't even get it to work on my end...), then
one can have the file.
otherwise, I can email it.

beyond that, not much luck...


as far as goodness, ...

it is not really all that good of an assembler, technically.
it does something for me that others currently don't, and that is assembling
and running in-memory (and serving as a backend for my other stuff), wheras
most others run and generate object files or similar.

today I beat against it some, and fixed up a few issues (ie: making it
accept more of nasm's syntax, ...).


note: my assembler is not intended to compete with nasm (or any others I am
fammiliar with), rather I feel it has a different usage domain.

then again, one can see what on/off a few months can accomplish...
Post by Evenbit
Nathan.
Wolfgang Kern
2007-04-06 12:01:34 UTC
Permalink
Hi "cr88192",
... the group is human moderated (vs. machine moderated),
Be aware, Chuck posts here in ALA in the living form! :)

[..]
First, I'd keep the semicolon as the "old standard" comment
delimiter and use the separator "|" between instructions.

[..]
You should avoid automatic code-creation and offer
a few macros instead of inventing new instructions.

But it would be right for the few instructions which
"really need" a sizecast ie:

INCb/w/q/o[mem] instead of: INC very long word pointer [mem]

[..]
Mmh? Compile in memory?
Where else? :)

You mean immediate compilation with prototype opcode ?
where the programmer immediate can see code-size and format.
This is an interesting attempt as it would help for better
performing coding styles in general.
add
04,ib al,i8
X80/0,ib rm8,i8
WX83/0,ib rm16,i8
TX83/0,ib rm32,i8
X83/0,ib rm64,i8
X02/r r8,rm8
where W/T/X/... tell where prefixes go (Word, DWord, REX).
I've seen it on CLAX, now I think to know its purpose...


__
wolfgang
cr88192
2007-04-06 22:49:38 UTC
Permalink
Post by Wolfgang Kern
Hi "cr88192",
... the group is human moderated (vs. machine moderated),
Be aware, Chuck posts here in ALA in the living form! :)
ok.
Post by Wolfgang Kern
[..]
First, I'd keep the semicolon as the "old standard" comment
delimiter and use the separator "|" between instructions.
yes, I just don't like '|' this way, and may want to use it for something
else. also, I am more used to it serving as an operator than a seperator,
with typical seperators being ';', ',', whitespace (often a "very soft"
seperator), and newlines (often a "less soft" or "hard" seperator). to my
eyes, since ',' was already used, and ';' was not (at least as a seperator),
it seemed to make sense.


but, yes, semicolon serves as both comment and seperator...

as noted elsewhere, I already have a good mass of code (maybe 5-10 kloc or
so) that depends on this particular feature (2 different JIT backends), and
it would be too much of a hassle to go and modify it.

note: this kind of character overloading is very common in HLLs, which is
what I am most used to.
Post by Wolfgang Kern
[..]
You should avoid automatic code-creation and offer
a few macros instead of inventing new instructions.
But it would be right for the few instructions which
INCb/w/q/o[mem] instead of: INC very long word pointer [mem]
macros:
would be a hassle to implement (sensible, maybe, if there will be a lot of
human-written code, but very optional for machine-written code, where adding
a few synthetic utility ops may make sense).

if you are asking about inc_r and dec_r, these were because originally, the
opcodes had conflicted with the REX prefix, and at the time I was uncertain
as to how I would distinguish between x86-32 and 64.

later on, I added a feature which was that doing:
inc reg

would be interpreted by the assembler, and if not in x86-64 mode, will be
silently converted to the '_r' forms.

some other cases are similar.

inc word [esi]
inc dword [esi]

is how it is done at present in my case (ptr is optional/ignored).
note that often duplicating opcodes with different names leads to inflation
in the listing files (they are regarded as completely different opcodes, and
are duplicated accordingly).

some cases have been handled this way though, namely where it was ambiguous
(ie: my current assembler can't figure it out).

thus:
movzx and movsz

now have alternative forms:
movzxw and movsxw

could possibly also add:
movzxb and movsxb
as alternatives to the originals (for clarity).


this was not done out of any aesthetic sense, but rather, because I needed
to actually use them and the assembler had a limitation...
Post by Wolfgang Kern
[..]
Mmh? Compile in memory?
Where else? :)
You mean immediate compilation with prototype opcode ?
where the programmer immediate can see code-size and format.
This is an interesting attempt as it would help for better
performing coding styles in general.
actually, I mean that, I directly compile/assemble the code, and run it
where it is (vs storing it in object files and passing it off to the
linker). this is why it is needed to auto-link against the host app, so that
eventually I may be able to run dynamicly compiled code just like statically
compiled code (apart from the fact that, sadly, anything pruned out by the
traditional linker is not directly usable).

as such, I am considering specialized object file, and possibly library
loading, where it may be possible to pull new code/data from libraries as
needed (or to simply just link the whole big mass into memory). at least if
I am using this with statically compiled versions of the same libs, the
static versions should get precedence (so I am not ending up with mixed
duplicated and non-duplicated state).


if I were doing the traditional thing, likely I would just use nasm (or
gas).
Post by Wolfgang Kern
add
04,ib al,i8
X80/0,ib rm8,i8
WX83/0,ib rm16,i8
TX83/0,ib rm32,i8
X83/0,ib rm64,i8
X02/r r8,rm8
where W/T/X/... tell where prefixes go (Word, DWord, REX).
I've seen it on CLAX, now I think to know its purpose...
the listing is used in my assembler, to autogenerate the tables needed for
doing assembly.

the actual assembler itself knows hardly anything about the instructions,
only a few possible configurations, the registers, and how to encode certain
structures (such as the ModR/M and SIB bytes, ...), and past this point is
driven largely by tables.

potentially, other things, like the registers and encodings, could also be
moved to listing files, but at present this is not needed (could make sense,
ie, if the plan were to support further-reaching and non-x86 targets, but
more likely if it ever came to that it would make more sense just to write a
new assembler).

however, with some more recent changes now the address-size byte is inserted
automatically in some cases, so it works differently than the way it is done
in the listings (potentially, some esoteric situations could result in
duplicate address bytes, which would be bad).


then again, this would be cases like:
a16 jmp_w foo

likely rule:
avoid manual overrides if at all sensible.

if one types:
mov eax, [di] ;in 32 bit mode
or:
mov ax, [fs:esi] ;in 16-bit mode

at present the assembler should do something sensible (and in these
particular cases, the a16 or a32 prefix is simply redundant).

may clean this up eventually...
Post by Wolfgang Kern
__
wolfgang
Phil Carmody
2007-04-06 23:28:31 UTC
Permalink
Post by cr88192
Post by Wolfgang Kern
First, I'd keep the semicolon as the "old standard" comment
delimiter and use the separator "|" between instructions.
yes, I just don't like '|' this way, and may want to use it for something
else.
The only time I've seen it in assembly language is as a grouping
operator for VLIW instruction sets:

some load |
some muladd |
some flag test ; do 3 things in one tick.

I think your choice of whitespace sensitivity regarding parsing
of ';'s to be inadvisable.

Phil
--
"Home taping is killing big business profits. We left this side blank
so you can help." -- Dead Kennedys, written upon the B-side of tapes of
/In God We Trust, Inc./.
cr88192
2007-04-06 23:49:38 UTC
Permalink
Post by Phil Carmody
Post by cr88192
Post by Wolfgang Kern
First, I'd keep the semicolon as the "old standard" comment
delimiter and use the separator "|" between instructions.
yes, I just don't like '|' this way, and may want to use it for something
else.
The only time I've seen it in assembly language is as a grouping
some load |
some muladd |
some flag test ; do 3 things in one tick.
I think your choice of whitespace sensitivity regarding parsing
of ';'s to be inadvisable.
maybe, but it is already fairly well set.

I could also add '|', but I don't think I will remove ';' (lest I go and
have to modify a whole bunch of existing code...).

actually, personally I am not sure why it is such a big deal (it is a minor
optional feature anyways), but oh well.



put a few more misc things on my server here, including a dump of my VM core
(another piece of the project...), and some screenshots of the main project.

not going to put the whole project up, or at least not as a single large
file (then people would actually try to download it, and probably have it
fail partway through).
Post by Phil Carmody
Phil
--
"Home taping is killing big business profits. We left this side blank
so you can help." -- Dead Kennedys, written upon the B-side of tapes of
/In God We Trust, Inc./.
Betov
2007-04-07 08:10:20 UTC
Permalink
Post by cr88192
actually, personally I am not sure why it is such a big deal (it is a
minor optional feature anyways), but oh well.
I would not call a facility of enabling/disabling an Instruction
a minor detail. Particulary not when facing difficult debuggings.

Phil, A86, - which was my prefered Assembler in my DOS days -,
also accepts '|' for Multi-Instructions-Lines. OK, this was
undocumented, but, as long as the Pipe is not used in Asm
sources other than for some expressions parsers (easy to make
the difference)...


Betov.

< http://rosasm.org >
cr88192
2007-04-07 08:46:21 UTC
Permalink
Post by Betov
Post by cr88192
actually, personally I am not sure why it is such a big deal (it is a
minor optional feature anyways), but oh well.
I would not call a facility of enabling/disabling an Instruction
a minor detail. Particulary not when facing difficult debuggings.
well, in any case, I went and added '|' as well. unlike ';', one is free to
put whitespace around it however...
Post by Betov
Phil, A86, - which was my prefered Assembler in my DOS days -,
also accepts '|' for Multi-Instructions-Lines. OK, this was
undocumented, but, as long as the Pipe is not used in Asm
sources other than for some expressions parsers (easy to make
the difference)...
yes, ok.


misc:
I have been adding more content to the site...
Post by Betov
Betov.
< http://rosasm.org >
Betov
2007-04-07 08:01:17 UTC
Permalink
Post by cr88192
as noted elsewhere, I already have a good mass of code (maybe 5-10
kloc or so) that depends on this particular feature (2 different JIT
backends), and it would be too much of a hassle to go and modify it.
This is _the_ valid argument.

And this point shows at what extend it is important to take the
required time, for taking a look at what the others have done,
for discussions, for evaluating the implications, and so on...
before taking any syntaxic decision. Once too much work has been
done with wrong choices, it is too late for asking advices,
unfortuntely.


Betov.

< http://rosasm.org >
cr88192
2007-04-07 09:13:40 UTC
Permalink
Post by Betov
Post by cr88192
as noted elsewhere, I already have a good mass of code (maybe 5-10
kloc or so) that depends on this particular feature (2 different JIT
backends), and it would be too much of a hassle to go and modify it.
This is _the_ valid argument.
And this point shows at what extend it is important to take the
required time, for taking a look at what the others have done,
for discussions, for evaluating the implications, and so on...
before taking any syntaxic decision. Once too much work has been
done with wrong choices, it is too late for asking advices,
unfortuntely.
well, I did talk about it once, on comp.lang.misc.
no one really gave much comment at the time (this was before discovering
either c.l.a.x or a.l.a).

well, I have added both syntaxes.
there is no harm in adding new syntax, just I won't remove the old one (and
may well continue using it).


actually, there are many remnants of older syntax around. the syntax has
steadily mutated since the first versions of the assembler, as noted
elsewhere. most of my code thus depends on the mass of intermediate
states...

I usually only notably change things when there is an imo strong reason, ie,
like when going from the function-call interface to the print-style
interface...


have been meaning to getting around to add a dynamic link-loader, haven't
done so yet...

the format I will be using will be COFF (aka: 'PE'). if I get around to it,
I may also support ELF.


as for symbol lookup:
at present I am using hash-based caching (and not a full hash table).
I figure common symbols should be accessed frequently enough to be in the
hash, and for infrequent/never-used symbols it shouldn't matter (the
overhead of maintaining a full hash table is likely higher in this case
anyways, given the potentially large number of symbols).

as such, right now I am using an 8 bit hash, but could maybe upgrade it to
12 bits.

after this, if falls back to a linear search, and if found the symbol is
hashed (newly added symbols are also hashed by default).

should be ok...
Post by Betov
Betov.
< http://rosasm.org >
Betov
2007-04-07 09:42:07 UTC
Permalink
Post by cr88192
at present I am using hash-based caching (and not a full hash table).
I don't know what you call a "hash-based caching"...
Post by cr88192
I figure common symbols should be accessed frequently enough to be in
the hash, and for infrequent/never-used symbols it shouldn't matter
(the overhead of maintaining a full hash table is likely higher in
this case anyways, given the potentially large number of symbols).
as such, right now I am using an 8 bit hash, but could maybe upgrade
it to 12 bits.
after this, if falls back to a linear search, and if found the symbol
is hashed (newly added symbols are also hashed by default).
There are as many hash table organizations as one can imagine,
but i fail to understand how any compiler could make a difference
about "frequently used" and "few used symbols" before doing the
job. If you are talking about the Processor "cache", this does
not make much sense in my mind.

Of course, CheckSums depend on the size of the materials you are
supposed to parse, but, for an Assembly Source, a table of 16 Bits
CheckSum is usually a good granularity. Also, the search time consumed
by such methods is nothing, in a whole encoding process. Therefore,
i can't see where the point is, with relying on any linear search for
part of the symbols. Personaly, for speed, i have choosen:

* Divide the Symbols on a kind basis (Equates, Labels, macros,...)

* Make a Checksum64 and a 16 from 64.

* Each record is, in the 16 Bits Table is:

"CheckSum64 / Pointer / Link".

where "Link" works like in a Linked List, Pointing to a Linear
Table of Records, downward, for cases when a Symbol would have
the same CheckSum64 as another one (which has never been observed),
and where "Pointer" Points to the real registration Table of the
given Symbol.

In my implementation, the first 16-Bits Table is Static. The second
Table is also static, actually, because we have never seen any Asm
Source which would require that much Symbols quantities, but it could
as well be made dynamic for extensions.


Betov.

< http://rosasm.org >
cr88192
2007-04-07 16:08:34 UTC
Permalink
Post by Betov
Post by cr88192
at present I am using hash-based caching (and not a full hash table).
I don't know what you call a "hash-based caching"...
basically, I have a long array of items (say, many thousands of items or
more);
I have a hash table, maybe 4096 entries;
this hash table holds indices into this array.


so, when retrieving an item, I compute its hash value (say, a simple
polynomial hash for the name).

I check if by, some chance the correct symbol is in the correct spot, and if
so, I return it.

otherwise, I do a more expensive search (ie, a linear search), and upon
finding the value, I add it to the hash table.

in this way, commonly accessed values can be typically found in O(1) time,
and uncommonly used ones in O(n) time.


imo, this has a few advantages over a single large hash-table:
there is no fixed inflation (say, with a conventional hash keeping it <75%
full or so);
there is no need to have either an initially large hash (> the max number of
items), or to have a resizable hash (with the possible associated cost of
rehashing all the items, which could become expensive with large tables);
it has lower memory overhead than hash-chaining approaches.

however, it assumes a skew, and of all the options is likely the slowest in
sub-optimal cases, with the average worst case being O(n), vs O(2) for a
plain hash at 75%, or O(n/m) for a chained hash.

however, the low overhead may be worthwhile (I assume that there will be a
high probability skew).

if not good enough, the next best option is probably a hash-chaining scheme.
Post by Betov
Post by cr88192
I figure common symbols should be accessed frequently enough to be in
the hash, and for infrequent/never-used symbols it shouldn't matter
(the overhead of maintaining a full hash table is likely higher in
this case anyways, given the potentially large number of symbols).
as such, right now I am using an 8 bit hash, but could maybe upgrade
it to 12 bits.
after this, if falls back to a linear search, and if found the symbol
is hashed (newly added symbols are also hashed by default).
There are as many hash table organizations as one can imagine,
but i fail to understand how any compiler could make a difference
about "frequently used" and "few used symbols" before doing the
job. If you are talking about the Processor "cache", this does
not make much sense in my mind.
note above.

hash-based caches are a common (and simple) optimization in a lot of my
code. the simplest explanation is that an infrequent item is unlikely to be
in the hash (or gets whiped out by something else), but frequently used
items will tend to already be in the hash.

this allows me to get by with hash tables typically far to small to hold all
the items in question.
Post by Betov
Of course, CheckSums depend on the size of the materials you are
supposed to parse, but, for an Assembly Source, a table of 16 Bits
CheckSum is usually a good granularity. Also, the search time consumed
by such methods is nothing, in a whole encoding process. Therefore,
i can't see where the point is, with relying on any linear search for
* Divide the Symbols on a kind basis (Equates, Labels, macros,...)
* Make a Checksum64 and a 16 from 64.
"CheckSum64 / Pointer / Link".
where "Link" works like in a Linked List, Pointing to a Linear
Table of Records, downward, for cases when a Symbol would have
the same CheckSum64 as another one (which has never been observed),
and where "Pointer" Points to the real registration Table of the
given Symbol.
In my implementation, the first 16-Bits Table is Static. The second
Table is also static, actually, because we have never seen any Asm
Source which would require that much Symbols quantities, but it could
as well be made dynamic for extensions.
well, in my case I may load a few-hundred-kloc of C codes' worth of object
files, and I am not certain, but this is likely to have a teh-huge number of
symbols (luckily I autoprune most of them, ie, compiler-generated symbols
and similar).

usually linear-searches + hash-based caching is fast enough, and very simple
to implement and work with (only a few extra steps over a plain linear
search).


usually, for much of anything past a few hundred items, I tend to use an
indexing structure (typically a hash of some sort or another).

then again, I have a lot of code where raw speed depends on table lookups,
and so a lot of this actually matters.


for example, in my scripting language there was a teqnique where I had used
the source, destination, and method info to basically hash method lookups
(avoiding a recursive object-graph search to locate the correct method).

this helped somewhat with performance.


the type-core/MM/GC for this VM however, relies on a single large hash table
(it also supports compile-time items as well, ie, so that I could generally
do things like distinguish between various symbols using a C switch
statement and similar).


in my renderer, at one point I ended up also using a hash scheme (similar to
the one described for symbol lookup), for fetching textures by name
(speeding up both model loading, and more importantly, some dynamic CSG code
which re-fetched the texturemaps by name for every frame).


actually, I use things like this fairly often.
Post by Betov
Betov.
< http://rosasm.org >
Wolfgang Kern
2007-04-07 15:05:33 UTC
Permalink
Hi "cr88192",
Post by Wolfgang Kern
[..]
First, I'd keep the semicolon as the "old standard" comment
delimiter and use the separator "|" between instructions.
yes, I just don't like '|' this way, and may want to use it for ...
Ok.
but, yes, semicolon serves as both comment and seperator...
Would be hard for most programmers to distinguish between
' ;' and '; ' or ';' ...

How if you help it with formatting ?
ie: have comments at a defined TAB-stop and
use an ALT-";"-Key to tell it's a comment.
as noted elsewhere, I already have a good mass of code (maybe 5-10 kloc
or so) that depends on this particular feature (2 different JIT backends),
and it would be too much of a hassle to go and modify it.
Often things gotta change on users demand ... :)
note: this kind of character overloading is very common in HLLs,
which is what I am most used to.
I hope you try to write an assembler!

[..]
would be a hassle to implement (sensible, maybe, if there will be a lot
of human-written code, but very optional for machine-written code,
where adding a few synthetic utility ops may make sense).
I see.

[REX and INC reg]

Just one flag is required to either produce the two or the one byte code.
some other cases are similar.
inc word [esi]
inc dword [esi]
is how it is done at present in my case (ptr is optional/ignored).
Because I'm a lazy typist, I use the shorter INCb INCw INCd INCq
similar to MOVSb/w/d/q, but your way is more portable.
note that often duplicating opcodes with different names leads to
inflation in the listing files (they are regarded as completely
different opcodes, and are duplicated accordingly).
Sure. And also the duplicated instructions like 8B C3 vs. 89 C8.
Here I always recommend to use the Direction-bit as a LOAD/store
indicator, so only MOV [mem],reg should use the '89' form.
some cases have been handled this way though, namely where it was
ambiguous (ie: my current assembler can't figure it out).
movzx and movsz
movzxw and movsxw
movzxb and movsxb
as alternatives to the originals (for clarity).
Yes, the CPU instructions are different for MOVZXb and MOVZXw/d/q
So it sounds logical to add these optional to the syntax.

btw:(64-bit mode)
I optionally use 'Zp' in my disassembler to indicate the inherent
ZeroPage Addressing, where the upper 32 bits are quietly zeroed.
But for an assembler this is a 'just know, don't care' issue.

[..]
Post by Wolfgang Kern
Mmh? Compile in memory?
Where else? :)
You mean immediate compilation with prototype opcode ?
where the programmer immediate can see code-size and format.
This is an interesting attempt as it would help for better
performing coding styles in general.
actually, I mean that, I directly compile/assemble the code, and run
it where it is (vs storing it in object files and passing it off to
the linker). this is why it is needed to auto-link against the host
app, so that eventually I may be able to run dynamicly compiled code
just like statically compiled code (apart from the fact that, sadly,
anything pruned out by the traditional linker is not directly usable).
Yeah, good for immediate test and debug, but will be tricky to avoid
run-time compiled delays.
as such, I am considering specialized object file, and possibly library
loading, where it may be possible to pull new code/data from libraries
as needed (or to simply just link the whole big mass into memory).
at least if I am using this with statically compiled versions of the
same libs, the static versions should get precedence (so I am not ending
up with mixed duplicated and non-duplicated state).
I wrote my 'libs' as self-relocating modules, so they will run
anywhere in memory without linking-tools and relocating.
The address given by mem_alloc for loading it is already the
link address and there are no delaying relocate needs at all.

[about your list]
Post by Wolfgang Kern
I've seen it on CLAX, now I think to know its purpose...
the listing is used in my assembler, to autogenerate the tables needed
for doing assembly.
[..]
Yes, I wrote my disassembler in a similar way, just swap the
tables to work with another CPU-family...
however, with some more recent changes now the address-size byte is
inserted automatically in some cases, so it works differently than
the way it is done in the listings (potentially, some esoteric
situations could result in duplicate address bytes, which would be bad).
Oh yes, the '67'-override was always a problem for assemblers,
so not too many work it out the correct way or support it at all.

Have you planned to allow mixed code assembly?
ie: use16(32,64)
Here you'll need all allowed mix of prefix bytes available.

[..]
at present the assembler should do something sensible (and in these
particular cases, the a16 or a32 prefix is simply redundant).
may clean this up eventually...
I'd keep it alive, just in case...

__
wolfgang
cr88192
2007-04-07 23:07:26 UTC
Permalink
Post by Wolfgang Kern
Hi "cr88192",
Post by Wolfgang Kern
[..]
First, I'd keep the semicolon as the "old standard" comment
delimiter and use the separator "|" between instructions.
yes, I just don't like '|' this way, and may want to use it for ...
Ok.
but, yes, semicolon serves as both comment and seperator...
Would be hard for most programmers to distinguish between
' ;' and '; ' or ';' ...
How if you help it with formatting ?
ie: have comments at a defined TAB-stop and
use an ALT-";"-Key to tell it's a comment.
errm, I have no intent at this point for any kind of specialized editor. I
tend to use notepad for everything externally, and even internal to my apps
(I use custom gui rendering code), I use a general-purpose text-editor
widget (has an interface very similar to notepad, albeit lacking any menus
or 'specific' features).

in time this may be used for inline/runtime code editing, but I don't know
(don't exactly do much coding inside my apps).

some people have before complained and demeaned notepad, but personally I
feel it is one of the best general purpose text editors (or at least, when
set to use a nice fixed-width font, presently I use fixedsys).

most of the time, it is fine, if a few times some features would have been
nice (ie: a more capable/intelligent find/replace feature).

I also like how it is fairly light on memory and windows resources, so I can
have a good number of them running (nice though would be a notepad like
editor where each window did not use any windows resources, and was capable
of partly sorting open windows by category).

dunno though, good enough (many others I have looked at typically seem
either overly specialized or wonky).

better find/replace would be nice, syntax highlighting maybe (don't find
much need personally), heavy resource/mem use or some wonky/incapable
interface, no...

notepad also comes by default with windows, which is worth something by
itself.

I dislike though when other people had used, ie, my laptop, and then went
and resized the window or changed the font (I also like having each window
at a fixed 80x25 layout, odd as that is, before thinking it would be nice if
I could lock the size...).

for some tasks though I will use other editors (such as vi or emacs). on
linux for gui-based editing, I have usually used gedit or kedit (and often
vi from the shell for small tasks).

reminds me of at one time, my mom was taking some linux-certification
course, and needed to be assisted with using VI. seems many people have a
surprisingly difficult time with this. I just don't like using vi for any
larger tasks (a linux clone of the old dos 'edit' would have been nice...).
Post by Wolfgang Kern
as noted elsewhere, I already have a good mass of code (maybe 5-10 kloc
or so) that depends on this particular feature (2 different JIT backends),
and it would be too much of a hassle to go and modify it.
Often things gotta change on users demand ... :)
note, I still intend this primarily for backend/autogenerated code,
reasoning mostly that assembler now exists in a state of great decline
anymore, ie, where everyone and their dog knows Java, many know C, and only
some know assembler...
Post by Wolfgang Kern
note: this kind of character overloading is very common in HLLs,
which is what I am most used to.
I hope you try to write an assembler!
I write it as I write it, with whatever seems to IMO make sense.
an assembler exists as an assembler, since it is the lowest reasonable level
for code generation (one moves up to the level of bytecode, and is limited
to whatever the JIT implements, and one moves lower than assembler, well
then they have an ugly mess...).
Post by Wolfgang Kern
[..]
would be a hassle to implement (sensible, maybe, if there will be a lot
of human-written code, but very optional for machine-written code,
where adding a few synthetic utility ops may make sense).
I see.
[REX and INC reg]
Just one flag is required to either produce the two or the one byte code.
yes, that is what was done eventually (actually, it involved a flag check
and a special case in the encode-'op <reg>' function).

it involves actually changing the opcode nmonic index (internally, each
nmonic is given a number used prior to locating the correct form of the
opcode during assembly).
Post by Wolfgang Kern
some other cases are similar.
inc word [esi]
inc dword [esi]
is how it is done at present in my case (ptr is optional/ignored).
Because I'm a lazy typist, I use the shorter INCb INCw INCd INCq
similar to MOVSb/w/d/q, but your way is more portable.
yeah.
Post by Wolfgang Kern
note that often duplicating opcodes with different names leads to
inflation in the listing files (they are regarded as completely
different opcodes, and are duplicated accordingly).
Sure. And also the duplicated instructions like 8B C3 vs. 89 C8.
Here I always recommend to use the Direction-bit as a LOAD/store
indicator, so only MOV [mem],reg should use the '89' form.
yes, in my case I have reg,rm forms generally precede rm,reg forms, so they
are higher precedence.

similarities like this are not exploited (or exploitable) by the assembler,
so each is completely seperate wrt the listing (apart from being listed
under the same nmonic).
Post by Wolfgang Kern
some cases have been handled this way though, namely where it was
ambiguous (ie: my current assembler can't figure it out).
movzx and movsz
movzxw and movsxw
movzxb and movsxb
as alternatives to the originals (for clarity).
Yes, the CPU instructions are different for MOVZXb and MOVZXw/d/q
So it sounds logical to add these optional to the syntax.
yeah.
dunno what others have done here, I just noticed that, "oh crap", the only
distinction was a difference in the size of the right-hand memory oprand. my
assembler can't handle this one, so I split it off...

actually, as it is my assembler also can't handle 3-arg opcodes either (so
they have been generally ommitted).

I have at times considered partly rewriting this part of my assembler (both
the listing-translation tool and the opcode matching), so that each argument
is fully qualified (size and type), vs as it is where they are only partly
qualified (a single 'size' field is used for the whole opcode).

in this case, the current size field or similar would probably be reused as
an argument (allowing some 3-oprand forms and funky combinations of fixed
regs and sizes, as found in some opcodes).
Post by Wolfgang Kern
btw:(64-bit mode)
I optionally use 'Zp' in my disassembler to indicate the inherent
ZeroPage Addressing, where the upper 32 bits are quietly zeroed.
But for an assembler this is a 'just know, don't care' issue.
I am not sure, I am not fammiliar with this one...
Post by Wolfgang Kern
[..]
Post by Wolfgang Kern
Mmh? Compile in memory?
Where else? :)
You mean immediate compilation with prototype opcode ?
where the programmer immediate can see code-size and format.
This is an interesting attempt as it would help for better
performing coding styles in general.
actually, I mean that, I directly compile/assemble the code, and run
it where it is (vs storing it in object files and passing it off to
the linker). this is why it is needed to auto-link against the host
app, so that eventually I may be able to run dynamicly compiled code
just like statically compiled code (apart from the fact that, sadly,
anything pruned out by the traditional linker is not directly usable).
Yeah, good for immediate test and debug, but will be tricky to avoid
run-time compiled delays.
yeah.

then again, I have some experience writing dynamic compilers for script
languages. doing all this magic for mixed dynamic and statically compiled
code is in a way a reasonable next step.

at present, once code is assembled it is more or less frozen in place, so
should work about the same as normal statically-compiled code (except that
on windows I am currently running in memory grabbed from malloc of all
places...).

a linux port may need to use mmap, so that I can explicitly get
read/write/execute memory...
Post by Wolfgang Kern
as such, I am considering specialized object file, and possibly library
loading, where it may be possible to pull new code/data from libraries
as needed (or to simply just link the whole big mass into memory).
at least if I am using this with statically compiled versions of the
same libs, the static versions should get precedence (so I am not ending
up with mixed duplicated and non-duplicated state).
I wrote my 'libs' as self-relocating modules, so they will run
anywhere in memory without linking-tools and relocating.
The address given by mem_alloc for loading it is already the
link address and there are no delaying relocate needs at all.
yes.

however, these libs may well be masses of code compiled by GCC, and at least
for PE/COFF it refuses to generate PIC (actually, it claims that PIC is the
default, but from disassembly, obviously not). can't see why PIC is
supposedly not possible with COFF (one is only lacking a GOT). of course,
this could be argued to not be truely PIC, but only a hybrid (since, static
linking would still be needed in producing the lib).


oh well, I am currently thinking I will have to go to using a hash-chaining
scheme rather than a hash-caching scheme, because otherwise linking will
become an O(n^2 operation), and with the current size of the libs I am
considering, that could become horrible...
Post by Wolfgang Kern
[about your list]
Post by Wolfgang Kern
I've seen it on CLAX, now I think to know its purpose...
the listing is used in my assembler, to autogenerate the tables needed
for doing assembly.
[..]
Yes, I wrote my disassembler in a similar way, just swap the
tables to work with another CPU-family...
yes, except I use a single table for everything (all archs, all modes).

that is why certain things are represented with letters, namely so that they
can be handled in a mode-specific way.
Post by Wolfgang Kern
however, with some more recent changes now the address-size byte is
inserted automatically in some cases, so it works differently than
the way it is done in the listings (potentially, some esoteric
situations could result in duplicate address bytes, which would be bad).
Oh yes, the '67'-override was always a problem for assemblers,
so not too many work it out the correct way or support it at all.
Have you planned to allow mixed code assembly?
ie: use16(32,64)
Here you'll need all allowed mix of prefix bytes available.
mixed 16/32 bit code should work, at least in theory (beat against this
recently, but I am unsure as to whether or not I will ever have much reason
to use this).

past this, I am much less certain (mixing 64-bit long-mode code with 16 or
32 bit code, could be horrible, dunno what the hell the CPU does here).

or course, it is always possible to simply tell the assembler to use a
different size:

section .text

bits 32 (or .a32)
...

bits 64
...

but this is different...
Post by Wolfgang Kern
[..]
at present the assembler should do something sensible (and in these
particular cases, the a16 or a32 prefix is simply redundant).
may clean this up eventually...
I'd keep it alive, just in case...
yeah.
this prefix was borrowed from nasm anyways.
Post by Wolfgang Kern
__
wolfgang
Wolfgang Kern
2007-04-08 13:53:22 UTC
Permalink
Hello "cr88192",

[..]
Post by cr88192
Post by Wolfgang Kern
How if you help it with formatting ?
ie: have comments at a defined TAB-stop and
use an ALT-";"-Key to tell it's a comment.
errm, I have no intent at this point for any kind of specialized editor.
I see, for plain text-source import this isn't usable,
so your decision for the alternative "|" is just fine.

[...about editor]
But if you think into future, you might once add an integrated
debugger and/or other diplaying help-tools, hotkeys, memus or whatsoever...
Then you'll need your very own GUI displays and controls anyway.

Be aware of cursing yourself for your first decisions later in
the game (talking from own experience yet) when you have to break
your nails just to merge in additional functionality.

So my advice here is:
have your final target in mind from the very start.


How many bytes are one "kloc" ?
Post by cr88192
Post by Wolfgang Kern
Often things gotta change on users demand ... :)
note, I still intend this primarily for backend/autogenerated code,
reasoning mostly that assembler now exists in a state of great decline
anymore, ie, where everyone and their dog knows Java, many know C,
and only some know assembler...
You write an assembler? don't expect Java speaking dogs to use it :)
Post by cr88192
Post by Wolfgang Kern
I hope you try to write an assembler!
I write it as I write it, with whatever seems to IMO make sense.
an assembler exists as an assembler, since it is the lowest reasonable
level for code generation (one moves up to the level of bytecode,
and is limited to whatever the JIT implements, and one moves lower
than assembler, well then they have an ugly mess...).
Yes, I know.
My short-fast-smart code isn't 'beautiful' in the eyes of HL-coders.

[...]
Post by cr88192
Post by Wolfgang Kern
Yes, the CPU instructions are different for MOVZXb and MOVZXw/d/q
So it sounds logical to add these optional to the syntax.
yeah.
dunno what others have done here, I just noticed that, "oh crap",
the only distinction was a difference in the size of the right-hand
memory operand. my assembler can't handle this one, so I split it off...
I found two redundant (NOP) forms:
.use32
66 0f b7 .. MOVZXw ax,..
.use16
0f b7 .. MOVZXw ax,..
Post by cr88192
actually, as it is my assembler also can't handle 3-arg opcodes either (so
they have been generally ommitted).
??? I couldn't renounce of:

IMUL r,rm,imm
SHLD r,rm,imm
and all other 'dest,src,imm' instructions.
Post by cr88192
I have at times considered partly rewriting this part of my assembler
(both the listing-translation tool and the opcode matching),
so that each argument is fully qualified (size and type),
vs as it is where they are only partly qualified (a single 'size' field
is used for the whole opcode).
in this case, the current size field or similar would probably be reused
as an argument (allowing some 3-oprand forms and funky combinations of
fixed regs and sizes, as found in some opcodes).
I split the whole opcode list into 'one/two/three/block' operator parts
(the inherent CL on shift-double is also a third) and then grouped
it a second time into function-blocks
[ie: |add|..|xor|cmp|-group: 00..05,...,38..3d,80,81,83
|test|neg|not|inc|dec|...|: F6,F7,FE,FF
and so on]

This helps for detailed (value tracking) disassembling and for
immediate verbose comment/help as well.
Post by cr88192
Post by Wolfgang Kern
btw:(64-bit mode)
[ZeroPage Addressing,...]
Post by cr88192
I am not sure, I am not familiar with this one...
Is just a detail on how many 64-bit instruction work.

[about libs...]
Post by cr88192
linking would still be needed in producing the lib.
Yes, if the libs contain OS-specific API calls.
My (lib)modules are just OS-extenders which add functions to it
without using other API functions.
But right, in the windoze world this may not work at all.

[about hash]

Can't tell anything as I work the other way around :)
(I type my code on an editable disassembler)

[about '67']
Post by cr88192
mixed 16/32 bit code should work, at least in theory (beat against this
recently, but I am unsure as to whether or not I will ever have much
reason to use this).
The reason can be 'boot-code' and 'BIOS-calls'.
Post by cr88192
past this, I am much less certain (mixing 64-bit long-mode code with 16 or
32 bit code, could be horrible, dunno what the hell the CPU does here).
As an Author of an assembler you don't need to care :)
this will be the (final user) programmers headache only.
Post by cr88192
or course, it is always possible to simply tell the assembler to use
section .text
bits 32 (or .a32)
Don't mixup '.a32'(address override) with 'bits 32'
Post by cr88192
Post by Wolfgang Kern
I'd keep it alive, just in case...
yeah. this prefix was borrowed from nasm anyways.
Ok, even the 67 came with 386 :)

__
wolfgang

I uploaded a new (not topmost) variant of HEXTUTOR

http://web.utanet.at/schw1285/KESYS/HEXTUTOR.zip
cr88192
2007-04-08 14:04:26 UTC
Permalink
Post by Wolfgang Kern
Hello "cr88192",
[..]
Post by cr88192
Post by Wolfgang Kern
How if you help it with formatting ?
ie: have comments at a defined TAB-stop and
use an ALT-";"-Key to tell it's a comment.
errm, I have no intent at this point for any kind of specialized editor.
I see, for plain text-source import this isn't usable,
so your decision for the alternative "|" is just fine.
yes, I added this as well, either will work.
Post by Wolfgang Kern
[...about editor]
But if you think into future, you might once add an integrated
debugger and/or other diplaying help-tools, hotkeys, memus or
whatsoever...
Then you'll need your very own GUI displays and controls anyway.
Be aware of cursing yourself for your first decisions later in
the game (talking from own experience yet) when you have to break
your nails just to merge in additional functionality.
for asm?...

people take asm that seriously?...

I guess this is a different perspective from viewing it as simply a good way
of getting from an HLL to machine code...
Post by Wolfgang Kern
have your final target in mind from the very start.
well, current goal is to get better linking, and maybe working C
compilation.
Post by Wolfgang Kern
How many bytes are one "kloc" ?
depends I guess, I never really measured (I use a linecounter typically).

so, checking with the current VM core:
so, for 33785 loc, 700710 bytes, so, on average, about 20740 bytes/kloc,
which is about on average 20.74 bytes per-line.
Post by Wolfgang Kern
Post by cr88192
Post by Wolfgang Kern
Often things gotta change on users demand ... :)
note, I still intend this primarily for backend/autogenerated code,
reasoning mostly that assembler now exists in a state of great decline
anymore, ie, where everyone and their dog knows Java, many know C,
and only some know assembler...
You write an assembler? don't expect Java speaking dogs to use it :)
yes, they won't, and even most C people wont.

my C compiler might, and my JIT compilers do, that was the whole reason for
its existance. otherwise, from my POV, there is not much reason to use
assembler directly anymore (not sufficiently portable, the current baseline
more likely being something like C).
Post by Wolfgang Kern
Post by cr88192
Post by Wolfgang Kern
I hope you try to write an assembler!
I write it as I write it, with whatever seems to IMO make sense.
an assembler exists as an assembler, since it is the lowest reasonable
level for code generation (one moves up to the level of bytecode,
and is limited to whatever the JIT implements, and one moves lower
than assembler, well then they have an ugly mess...).
Yes, I know.
My short-fast-smart code isn't 'beautiful' in the eyes of HL-coders.
maybe, dunno.
Post by Wolfgang Kern
[...]
Post by cr88192
Post by Wolfgang Kern
Yes, the CPU instructions are different for MOVZXb and MOVZXw/d/q
So it sounds logical to add these optional to the syntax.
yeah.
dunno what others have done here, I just noticed that, "oh crap",
the only distinction was a difference in the size of the right-hand
memory operand. my assembler can't handle this one, so I split it off...
.use32
66 0f b7 .. MOVZXw ax,..
.use16
0f b7 .. MOVZXw ax,..
ok.
Post by Wolfgang Kern
Post by cr88192
actually, as it is my assembler also can't handle 3-arg opcodes either (so
they have been generally ommitted).
IMUL r,rm,imm
SHLD r,rm,imm
and all other 'dest,src,imm' instructions.
sadly, my assembler at present can't parse/match them.
as it is, making it do so would be a bit of a change...
Post by Wolfgang Kern
Post by cr88192
I have at times considered partly rewriting this part of my assembler
(both the listing-translation tool and the opcode matching),
so that each argument is fully qualified (size and type),
vs as it is where they are only partly qualified (a single 'size' field
is used for the whole opcode).
in this case, the current size field or similar would probably be reused
as an argument (allowing some 3-oprand forms and funky combinations of
fixed regs and sizes, as found in some opcodes).
I split the whole opcode list into 'one/two/three/block' operator parts
(the inherent CL on shift-double is also a third) and then grouped
it a second time into function-blocks
[ie: |add|..|xor|cmp|-group: 00..05,...,38..3d,80,81,83
|test|neg|not|inc|dec|...|: F6,F7,FE,FF
and so on]
This helps for detailed (value tracking) disassembling and for
immediate verbose comment/help as well.
ok.
Post by Wolfgang Kern
Post by cr88192
Post by Wolfgang Kern
btw:(64-bit mode)
[ZeroPage Addressing,...]
Post by cr88192
I am not sure, I am not familiar with this one...
Is just a detail on how many 64-bit instruction work.
[about libs...]
ok.
Post by Wolfgang Kern
Post by cr88192
linking would still be needed in producing the lib.
Yes, if the libs contain OS-specific API calls.
My (lib)modules are just OS-extenders which add functions to it
without using other API functions.
But right, in the windoze world this may not work at all.
yeah.

I have run into a few ugly problems here...
earlier had started a post about it.

realized that there are a few possibly ugly beasts lurking in dynamic
linking, and I am at present unsure how I will deal with them exactly.

as is, they could make things become inflexible and brittle (with annoying
and likely versioning issues, ...). somehow I had failed to take into
account questions like:
what if some code changes in some non-trivial way?
what if a struct changes somewhere?
...

if they crop up, the whole app could break (and I could end up with many of
the same restrictions as with static compilation/linking).

something like this may involve a kind of systematic seperation of concerns
(the same lib is not at the same time both statically and dynamicly linked,
struct changes are a serious matter, ...).


another issue was related to lack of clear info on the gnu version of the
'ar' fileformat...
Post by Wolfgang Kern
[about hash]
Can't tell anything as I work the other way around :)
(I type my code on an editable disassembler)
ok.
Post by Wolfgang Kern
[about '67']
?... '67'?
Post by Wolfgang Kern
Post by cr88192
mixed 16/32 bit code should work, at least in theory (beat against this
recently, but I am unsure as to whether or not I will ever have much
reason to use this).
The reason can be 'boot-code' and 'BIOS-calls'.
yes, but these are unlikely at present in my current apps (typically
operating purely within the confines of windows, and maybe eventually
linux).
Post by Wolfgang Kern
Post by cr88192
past this, I am much less certain (mixing 64-bit long-mode code with 16 or
32 bit code, could be horrible, dunno what the hell the CPU does here).
As an Author of an assembler you don't need to care :)
this will be the (final user) programmers headache only.
well, yes, to a degree.
Post by Wolfgang Kern
Post by cr88192
or course, it is always possible to simply tell the assembler to use
section .text
bits 32 (or .a32)
Don't mixup '.a32'(address override) with 'bits 32'
a32 and .a32 are different directives, the former overriding a single
instruction, and the latter changing the assembler mode (working the same as
'bits 32', since 'bits 32' was actually a more recent addition for possible
improved nasm compatibility).
Post by Wolfgang Kern
Post by cr88192
Post by Wolfgang Kern
I'd keep it alive, just in case...
yeah. this prefix was borrowed from nasm anyways.
Ok, even the 67 came with 386 :)
ok.
Post by Wolfgang Kern
__
wolfgang
I uploaded a new (not topmost) variant of HEXTUTOR
http://web.utanet.at/schw1285/KESYS/HEXTUTOR.zip
added an assorted docs (and a few links) section to my server...
Wolfgang Kern
2007-04-09 13:26:42 UTC
Permalink
Hi "cr88192",
Post by cr88192
Post by Wolfgang Kern
[...about editor]
[]
Post by cr88192
Post by Wolfgang Kern
Then you'll need your very own GUI displays and controls anyway.
Be aware of cursing yourself for your first decisions later in
the game (talking from own experience yet) when you have to break
your nails just to merge in additional functionality.
for asm?...
people take asm that seriously?...
I guess this is a different perspective from viewing it as simply
a good way of getting from an HLL to machine code...
Maybe I just talk for me yet (I wrote an whole OS in machine code),
but the problem with lateron changed structures applies to all
programming methods and often a rewrite from scratch is the only
solution then, also for VHLL-source (as you mentioned this below).
Post by cr88192
Post by Wolfgang Kern
have your final target in mind from the very start.
well, current goal is to get better linking, and maybe working C
compilation.
Ok, what I wanted to say is 'reserve future idea items' in
all your structs in time, so you can avoid rewrite or detouring
adaption needs.
Post by cr88192
Post by Wolfgang Kern
How many bytes are one "kloc" ?
... which is about on average 20.74 bytes per-line.
I see, KiloLinesOfSourceCode :)
Thanks, i wasn't sure about as the French use 'octets' for 'bytes'.
Post by cr88192
...., that was the whole reason for its existance.
otherwise, from my POV, there is not much reason to use
assembler directly anymore (not sufficiently portable, the current
baseline more likely being something like C).
I don't think ASM is dead, it just felt asleep for a while :)
CPU-specific code or source isn't too portable anyway.


[3-arg opcodes..]
Post by cr88192
sadly, my assembler at present can't parse/match them.
as it is, making it do so would be a bit of a change...
I'm afraid an assembler without it wont be accepted much.
Post by cr88192
I have run into a few ugly problems here...
earlier had started a post about it.
realized that there are a few possibly ugly beasts lurking in dynamic
linking, and I am at present unsure how I will deal with them exactly.
as is, they could make things become inflexible and brittle (with annoying
and likely versioning issues, ...). somehow I had failed to take into
what if some code changes in some non-trivial way?
what if a struct changes somewhere?
...
Unfortunately this problem exists (as said above).
The only solution I see would be a LIB-conversion tool,
manually converting will end up as complete rewrite.
But it's rumored that a few converters already exist.
Post by cr88192
Post by Wolfgang Kern
[about '67']
?... '67'?
0x67 , the address size override prefix byte :)
Post by cr88192
Post by Wolfgang Kern
mixed 16/32 bit code should work, at least in theory ...
The reason can be 'boot-code' and 'BIOS-calls'.
yes, but these are unlikely at present in my current apps (typically
operating purely within the confines of windows, and maybe eventually
linux).
Right, Linux seem to 'go BIOS' quite often.
Even many windoze core function rely on the BIOS
(the obligatory reboot needs after any hardware-fart).
Post by cr88192
Post by Wolfgang Kern
Don't mixup '.a32'(address override) with 'bits 32'
a32 and .a32 are different directives,...
I see.

__
wolfgang
r***@cs.ucr.edu
2007-04-09 13:46:40 UTC
Permalink
Post by Wolfgang Kern
Maybe I just talk for me yet (I wrote an whole OS in machine code),
but the problem with lateron changed structures applies to all
programming methods and often a rewrite from scratch is the only
solution then, also for VHLL-source (as you mentioned this below).
You really *do* talk for yourself here. Being able to change the
composition of a structure without rewriting all your code is a
*fundamental* thing to be able to do. Even if your assembler doesn't
support structs/records that make this process trivial, you can still
achieve almost the same thing using equates (at least, as far as being
able to change your structures around without rewriting everything).
You are discovering one of the main reasons why almost *no one* works
in binary machine code as a matter of course. There are a *few* times
when working in machine code is acceptable, but the time this is
needed is very rare. Rewriting one's code to do simple things like
move objects around in memory, relocate subroutines, or change the
definition of some structure is a non-starter.
Cheers,
Randy Hyde
Wolfgang Kern
2007-04-09 15:38:41 UTC
Permalink
Post by r***@cs.ucr.edu
Post by Wolfgang Kern
Maybe I just talk for me yet (I wrote an whole OS in machine code),
but the problem with lateron changed structures applies to all
programming methods and often a rewrite from scratch is the only
solution then, also for VHLL-source (as you mentioned this below).
You really *do* talk for yourself here. Being able to change the
composition of a structure without rewriting all your code is a
*fundamental* thing to be able to do. Even if your assembler doesn't
support structs/records that make this process trivial, you can still
achieve almost the same thing using equates (at least, as far as being
able to change your structures around without rewriting everything).
You are discovering one of the main reasons why almost *no one* works
in binary machine code as a matter of course. There are a *few* times
when working in machine code is acceptable, but the time this is
needed is very rare. Rewriting one's code to do simple things like
move objects around in memory, relocate subroutines, or change the
definition of some structure is a non-starter.
So please let me know how you react when vista's new windoze API
need more and/or other arguments than what's in your library.

__
wolfgang
r***@cs.ucr.edu
2007-04-09 20:15:07 UTC
Permalink
Post by Wolfgang Kern
Post by r***@cs.ucr.edu
Post by Wolfgang Kern
Maybe I just talk for me yet (I wrote an whole OS in machine code),
but the problem with lateron changed structures applies to all
programming methods and often a rewrite from scratch is the only
solution then, also for VHLL-source (as you mentioned this below).
You really *do* talk for yourself here. Being able to change the
composition of a structure without rewriting all your code is a
*fundamental* thing to be able to do. Even if your assembler doesn't
support structs/records that make this process trivial, you can still
achieve almost the same thing using equates (at least, as far as being
able to change your structures around without rewriting everything).
You are discovering one of the main reasons why almost *no one* works
in binary machine code as a matter of course. There are a *few* times
when working in machine code is acceptable, but the time this is
needed is very rare. Rewriting one's code to do simple things like
move objects around in memory, relocate subroutines, or change the
definition of some structure is a non-starter.
So please let me know how you react when vista's new windoze API
need more and/or other arguments than what's in your library.
How is this any different than when a new Linux or FreeBSD API needs
more or different arguments than the APIs I originally called? No big
deal -- it's easy enough to write a wrapper function to fill in
default parameter values, adjust data types, stuff like that. Indeed,
it's really no different than calling Windows or Linux primitives from
the HLA stdlib. And the HLA stdlib is the perfect place to put this,
as it insulates applications that are written using the stdlib from
such details. That is the beauty of such abstractions.
Cheers,
Randy Hyde
Wolfgang Kern
2007-04-10 13:04:45 UTC
Permalink
Post by r***@cs.ucr.edu
Post by Wolfgang Kern
So please let me know how you react when vista's new windoze API
need more and/or other arguments than what's in your library.
How is this any different than when a new Linux or FreeBSD API needs
more or different arguments than the APIs I originally called? No big
deal -- it's easy enough to write a wrapper function to fill in
default parameter values, adjust data types, stuff like that. Indeed,
it's really no different than calling Windows or Linux primitives from
the HLA stdlib. And the HLA stdlib is the perfect place to put this,
as it insulates applications that are written using the stdlib from
such details. That is the beauty of such abstractions.
I see, all your clients have to replace all their apps with
new compiled code. Fine.

__
wolfgang
r***@cs.ucr.edu
2007-04-10 22:25:29 UTC
Permalink
Post by Wolfgang Kern
Post by r***@cs.ucr.edu
Post by Wolfgang Kern
So please let me know how you react when vista's new windoze API
need more and/or other arguments than what's in your library.
How is this any different than when a new Linux or FreeBSD API needs
more or different arguments than the APIs I originally called? No big
deal -- it's easy enough to write a wrapper function to fill in
default parameter values, adjust data types, stuff like that. Indeed,
it's really no different than calling Windows or Linux primitives from
the HLA stdlib. And the HLA stdlib is the perfect place to put this,
as it insulates applications that are written using the stdlib from
such details. That is the beauty of such abstractions.
I see, all your clients have to replace all their apps with
new compiled code. Fine.
Well, that's a good spot better than having them wait for me to
completely rewrite the code. That's assuming, of course, that they
want to take advantage of the new APIs for some reason. In reality,
almost every OS upgrade I've seen (including Vista, thus far) tends to
preserve the old APIs except in rare cases. E.g., every HLA program
I've written seems to run just fine under Vista, without as much as a
recompile.
Cheers,
Randy Hyde
Jim Carlock
2007-04-10 23:37:22 UTC
Permalink
<***@cs.ucr.edu> wrote:
: Well, that's a good spot better than having them wait for me to
: completely rewrite the code. That's assuming, of course, that they
: want to take advantage of the new APIs for some reason. In reality,
: almost every OS upgrade I've seen (including Vista, thus far) tends
: to preserve the old APIs except in rare cases. E.g., every HLA
: program I've written seems to run just fine under Vista, without as
: much as a recompile.

Vista supposedly drops 16-bit support altogether.

When XP came out, while it's not API's per se for the applications,
the APIs(?) for the device drivers changed. I've got some devices
that work on Windows 3.1, W9x, NT3.5, and NT4 BUT fail on
W-XP (perhaps they are 8/16-bit drivers or perhaps they employed
a different device driver API?). Two of the devices are external disk
drives that hook up through a parallel port.
--
Jim Carlock
Post replies to the group.
Evenbit
2007-04-11 00:27:11 UTC
Permalink
Post by Jim Carlock
: Well, that's a good spot better than having them wait for me to
: completely rewrite the code. That's assuming, of course, that they
: want to take advantage of the new APIs for some reason. In reality,
: almost every OS upgrade I've seen (including Vista, thus far) tends
: to preserve the old APIs except in rare cases. E.g., every HLA
: program I've written seems to run just fine under Vista, without as
: much as a recompile.
Vista supposedly drops 16-bit support altogether.
Progress! No compelling reason to support something that is rarely
used.
Post by Jim Carlock
When XP came out, while it's not API's per se for the applications,
the APIs(?) for the device drivers changed. I've got some devices
that work on Windows 3.1, W9x, NT3.5, and NT4 BUT fail on
W-XP (perhaps they are 8/16-bit drivers or perhaps they employed
a different device driver API?). Two of the devices are external disk
drives that hook up through a parallel port.
Difficult to find XP-style drivers for anything that connects to the
parallel port (flat-bed scanners, printers, etc). The venders assumed
that if your rig is outfitted with XP, then you most likely also have
USB-connected devices.

Nathan.
Jim Carlock
2007-04-11 00:47:49 UTC
Permalink
Nathan stated...
: Difficult to find XP-style drivers for anything that connects to the
: parallel port (flat-bed scanners, printers, etc). The venders assumed
: that if your rig is outfitted with XP, then you most likely also have
: USB-connected devices.

I'm not sure about the printer thing, Nathan. About the only that
Microsoft supports IS a printer, and Microsoft even provided
support for scanning and fax machines.

Microsoft failed to support disk drives that hook up through a
parallel port, that's for sure.
--
Jim Carlock
r***@yahoo.com
2007-04-11 04:32:38 UTC
Permalink
Post by Jim Carlock
Vista supposedly drops 16-bit support altogether.
Only 64 bit Vista drops 16 bit support. As did 64 bit XP. I think MS
is going to find that a sticking point with more than a few users in
the corporate world.
Post by Jim Carlock
When XP came out, while it's not API's per se for the applications,
the APIs(?) for the device drivers changed. I've got some devices
that work on Windows 3.1, W9x, NT3.5, and NT4 BUT fail on
W-XP (perhaps they are 8/16-bit drivers or perhaps they employed
a different device driver API?). Two of the devices are external disk
drives that hook up through a parallel port.
Not that there aren't a few incompatibilities floating around, but
Vista implements essentially the same driver model (with extensions,
of course) as XP/2K/NT, but the major incompatibility is that Vista in
general refuses to allow you to load an unsigned driver. In 32 bit
Vista an admin can bypass that, but not in 64 bit. And there are
situations where a driver must be signed (anything that loads into the
DRM data path, some of the boot time stuff, for example).

And of course you can't get a driver "really" signed without going
through MS's certification program. In case you're wondering, a
developer can sign their own driver, but you have to configure the
target system to trust your private certificate source.

And just as a reminder, 64 bit Vista cannot load *any* 32 bit drivers
(same as 64 bit XP, of course).
Wolfgang Kern
2007-04-11 15:46:03 UTC
Permalink
Post by Jim Carlock
: Well, that's a good spot better than having them wait for me to
: completely rewrite the code. That's assuming, of course, that they
: want to take advantage of the new APIs for some reason. In reality,
: almost every OS upgrade I've seen (including Vista, thus far) tends
: to preserve the old APIs except in rare cases. E.g., every HLA
: program I've written seems to run just fine under Vista, without as
: much as a recompile.
Vista supposedly drops 16-bit support altogether.
When XP came out, while it's not API's per se for the applications,
the APIs(?) for the device drivers changed. I've got some devices
that work on Windows 3.1, W9x, NT3.5, and NT4 BUT fail on
W-XP (perhaps they are 8/16-bit drivers or perhaps they employed
a different device driver API?). Two of the devices are external disk
drives that hook up through a parallel port.
The main driver problem with XP seems to be the 'new' M$-strategy.
All drivers must have an M$ approved 'certificate' otherwise they
may not be accepted by the installer.
For win98se it was enough to say 'chigago'...

__
wolfgang
/\\\\o//\\annabee
2007-04-11 16:33:39 UTC
Permalink
På Wed, 11 Apr 2007 17:46:03 +0200, skrev Wolfgang Kern
Post by Wolfgang Kern
Post by Jim Carlock
: Well, that's a good spot better than having them wait for me to
: completely rewrite the code. That's assuming, of course, that they
: want to take advantage of the new APIs for some reason. In reality,
: almost every OS upgrade I've seen (including Vista, thus far) tends
: to preserve the old APIs except in rare cases. E.g., every HLA
: program I've written seems to run just fine under Vista, without as
: much as a recompile.
Vista supposedly drops 16-bit support altogether.
When XP came out, while it's not API's per se for the applications,
the APIs(?) for the device drivers changed. I've got some devices
that work on Windows 3.1, W9x, NT3.5, and NT4 BUT fail on
W-XP (perhaps they are 8/16-bit drivers or perhaps they employed
a different device driver API?). Two of the devices are external disk
drives that hook up through a parallel port.
The main driver problem with XP seems to be the 'new' M$-strategy.
All drivers must have an M$ approved 'certificate' otherwise they
may not be accepted by the installer.
For win98se it was enough to say 'chigago'...
:)))

It would be very nice if you could create a driver example for RosAsm,
that worked on XP. Please?
Just an example. Just some code that can install as a driver, and
virtually do nothing. Something we can build on?

Please???
Post by Wolfgang Kern
__
wolfgang
--

Wolfgang Kern
2007-04-11 15:38:53 UTC
Permalink
Post by r***@cs.ucr.edu
... That is the beauty of such abstractions.
I see, all your clients have to replace all their apps with
new compiled code. Fine.
Well, that's a good spot better than having them wait for me to
completely rewrite the code. That's assuming, of course, that they
want to take advantage of the new APIs for some reason. In reality,
almost every OS upgrade I've seen (including Vista, thus far) tends to
preserve the old APIs except in rare cases. E.g., every HLA program
I've written seems to run just fine under Vista, without as much as a
recompile.
My way in OS-upgrade is quite different,
I just mail my clients a tiny file with a few code bytes
and the system adds or replace it in its core without any need
for recompile or buying a new OS-version.


__
wolfgang
/\\\\o//\\annabee
2007-04-11 16:30:41 UTC
Permalink
På Wed, 11 Apr 2007 17:38:53 +0200, skrev Wolfgang Kern
Post by Wolfgang Kern
Post by r***@cs.ucr.edu
... That is the beauty of such abstractions.
I see, all your clients have to replace all their apps with
new compiled code. Fine.
Well, that's a good spot better than having them wait for me to
completely rewrite the code. That's assuming, of course, that they
want to take advantage of the new APIs for some reason. In reality,
almost every OS upgrade I've seen (including Vista, thus far) tends to
preserve the old APIs except in rare cases. E.g., every HLA program
I've written seems to run just fine under Vista, without as much as a
recompile.
My way in OS-upgrade is quite different,
I just mail my clients a tiny file with a few code bytes
and the system adds or replace it in its core without any need
for recompile or buying a new OS-version.
Thats awsome. You just feed it code sort of? :)
So, can you give an example of how it does this?
Say you want to replace one of your functions.

How do you supply the new function, in what format
and where and how does the OS do the upgrade?
Post by Wolfgang Kern
__
wolfgang
--
cr88192
2007-04-10 10:35:58 UTC
Permalink
Post by Wolfgang Kern
Hi "cr88192",
Post by cr88192
Post by Wolfgang Kern
[...about editor]
[]
Post by cr88192
Post by Wolfgang Kern
Then you'll need your very own GUI displays and controls anyway.
Be aware of cursing yourself for your first decisions later in
the game (talking from own experience yet) when you have to break
your nails just to merge in additional functionality.
for asm?...
people take asm that seriously?...
I guess this is a different perspective from viewing it as simply
a good way of getting from an HLL to machine code...
Maybe I just talk for me yet (I wrote an whole OS in machine code),
but the problem with lateron changed structures applies to all
programming methods and often a rewrite from scratch is the only
solution then, also for VHLL-source (as you mentioned this below).
yes.

I wrote an OS in C, and eventually gave up because it had started down the
path of becomming "yet another unix".
Post by Wolfgang Kern
Post by cr88192
Post by Wolfgang Kern
have your final target in mind from the very start.
well, current goal is to get better linking, and maybe working C
compilation.
Ok, what I wanted to say is 'reserve future idea items' in
all your structs in time, so you can avoid rewrite or detouring
adaption needs.
oh, ok.
Post by Wolfgang Kern
Post by cr88192
Post by Wolfgang Kern
How many bytes are one "kloc" ?
... which is about on average 20.74 bytes per-line.
I see, KiloLinesOfSourceCode :)
Thanks, i wasn't sure about as the French use 'octets' for 'bytes'.
yes, a few too many of these and even minor changes become a horrible
pain...
Post by Wolfgang Kern
Post by cr88192
...., that was the whole reason for its existance.
otherwise, from my POV, there is not much reason to use
assembler directly anymore (not sufficiently portable, the current
baseline more likely being something like C).
I don't think ASM is dead, it just felt asleep for a while :)
CPU-specific code or source isn't too portable anyway.
yeah.
Post by Wolfgang Kern
[3-arg opcodes..]
Post by cr88192
sadly, my assembler at present can't parse/match them.
as it is, making it do so would be a bit of a change...
I'm afraid an assembler without it wont be accepted much.
yes.
luckily I have crufted over the problem in a few cases, but yes, I will have
to add them at some point.
Post by Wolfgang Kern
Post by cr88192
I have run into a few ugly problems here...
earlier had started a post about it.
realized that there are a few possibly ugly beasts lurking in dynamic
linking, and I am at present unsure how I will deal with them exactly.
as is, they could make things become inflexible and brittle (with annoying
and likely versioning issues, ...). somehow I had failed to take into
what if some code changes in some non-trivial way?
what if a struct changes somewhere?
...
Unfortunately this problem exists (as said above).
The only solution I see would be a LIB-conversion tool,
manually converting will end up as complete rewrite.
But it's rumored that a few converters already exist.
one can always recompile...

but, yes, this makes for code versioning problems, at least for statically
compiled code.
Post by Wolfgang Kern
Post by cr88192
Post by Wolfgang Kern
[about '67']
?... '67'?
0x67 , the address size override prefix byte :)
somehow I missed this...
Post by Wolfgang Kern
Post by cr88192
Post by Wolfgang Kern
mixed 16/32 bit code should work, at least in theory ...
The reason can be 'boot-code' and 'BIOS-calls'.
yes, but these are unlikely at present in my current apps (typically
operating purely within the confines of windows, and maybe eventually
linux).
Right, Linux seem to 'go BIOS' quite often.
Even many windoze core function rely on the BIOS
(the obligatory reboot needs after any hardware-fart).
ok.

I guess, this is for usemode stuff, mostly...
then again, I have all these features that are hardly usable in userspace as
well.

hmm...
Post by Wolfgang Kern
Post by cr88192
Post by Wolfgang Kern
Don't mixup '.a32'(address override) with 'bits 32'
a32 and .a32 are different directives,...
I see.
yes, a result of the kind of funkiness that can crop up in my code...
Post by Wolfgang Kern
__
wolfgang
Wolfgang Kern
2007-04-10 13:55:07 UTC
Permalink
Hello "cr88192",

[big snip] seems we found almost an overall agree :)
Post by cr88192
... LIB-conversion tool
one can always recompile...
but, yes, this makes for code versioning problems,
at least for statically compiled code.
_________________ OT
btw:(LIB-linking)
HLLs usually sacrifies EBP for frames and locals.
I don't use stack-frames, but I also abuse one register (most EDI)
to cover all dynamic linking and to point to the linked data-
structure-block at the same time.

ie: all data access in the modules memory use EDI+...
so EDI is
1. the pointer(+known offset) to the module code entry point
2 the pointer to the modules data (always in front of code)
3. a unique handle (as it's physical address here)

so the returned EDI still points to the called routines data-struct
and there's no need to copy and pass parameters over the stack.

This is not C/C+-/and the like/ compatible, but quite faster
and easier.
___________________/OT

[16-bit BIOS access]
Post by cr88192
I guess, this is for usemode stuff, mostly...
then again, I have all these features that are hardly usable
in userspace as well.
hmm...
Perhaps as option for Linux coders.

__
wolfgang
cr88192
2007-04-11 04:59:00 UTC
Permalink
Post by Wolfgang Kern
Hello "cr88192",
[big snip] seems we found almost an overall agree :)
and I am behind again as spring break has ended...
Post by Wolfgang Kern
Post by cr88192
... LIB-conversion tool
one can always recompile...
but, yes, this makes for code versioning problems,
at least for statically compiled code.
_________________ OT
btw:(LIB-linking)
HLLs usually sacrifies EBP for frames and locals.
I don't use stack-frames, but I also abuse one register (most EDI)
to cover all dynamic linking and to point to the linked data-
structure-block at the same time.
ie: all data access in the modules memory use EDI+...
so EDI is
1. the pointer(+known offset) to the module code entry point
2 the pointer to the modules data (always in front of code)
3. a unique handle (as it's physical address here)
so the returned EDI still points to the called routines data-struct
and there's no need to copy and pass parameters over the stack.
This is not C/C+-/and the like/ compatible, but quite faster
and easier.
___________________/OT
yes.

I am mostly using C in my case, and any dynamically compiled code, if
possible, will likely use the C convention.

actually, even my dynamic language's JIT-compiled code was designed to
interface with C code, however, via use of thunking and trampolines.

almost no state was maintained on the C stack, the JIT code largely
emulating the behavior of the bytecode interpreter.
Post by Wolfgang Kern
[16-bit BIOS access]
Post by cr88192
I guess, this is for usemode stuff, mostly...
then again, I have all these features that are hardly usable
in userspace as well.
hmm...
Perhaps as option for Linux coders.
dunno.

in my case I develop on winxp...

actually, I never really used 95 or 98 much, because:
during the high point of 95, one could get by ok just using dos most of the
time;
by the time 98 started comming around, I jumped to linux, and used that for
a while for the most part. eventually when I moved back into windows land, I
was using 2k, and later xp...
Post by Wolfgang Kern
__
wolfgang
Wolfgang Kern
2007-04-11 15:56:05 UTC
Permalink
Hi "cr88192",
Post by cr88192
Post by Wolfgang Kern
[big snip] seems we found almost an overall agree :)
and I am behind again as spring break has ended...
Right, let's go back to work.

Looking forward to once see the new "elsewhereASM" :)

Good Luck! and may Lord Logic protect you.

__
wolfgang
/\\\\o//\\annabee
2007-04-08 22:12:55 UTC
Permalink
På Sun, 08 Apr 2007 15:53:22 +0200, skrev Wolfgang Kern
Post by Wolfgang Kern
__
wolfgang
I uploaded a new (not topmost) variant of HEXTUTOR
http://web.utanet.at/schw1285/KESYS/HEXTUTOR.zip
thanks. :)

Is it allowed to use this code and build on it?



--
Wolfgang Kern
2007-04-09 13:53:07 UTC
Permalink
Post by /\\\\o//\\annabee
Post by Wolfgang Kern
I uploaded a new (not topmost) variant of HEXTUTOR
http://web.utanet.at/schw1285/KESYS/HEXTUTOR.zip
thanks. :)
Is it allowed to use this code and build on it?
Sure, as long you don't use the overlay file (disass.com)
in any commercial way, you don't need to buy a licence.
Just mention the copyright:

HEXEDIT64 disassembler
(C)Wolfgang Kern "kesys-development @utanet.at"
disass.com is not freeware nor shareware and
any commercial use of it is prohibited.

And if you mail me where and how you like to use it,
I'll add your name to the granted users list, that's all.

But be aware of a few bugs in the DEMO-version.

You may find LOAD/SAVE routines partially in the source already.

__
wolfgang
/\\\\o//\\annabee
2007-04-09 14:57:52 UTC
Permalink
På Mon, 09 Apr 2007 15:53:07 +0200, skrev Wolfgang Kern
Post by Wolfgang Kern
Post by /\\\\o//\\annabee
Post by Wolfgang Kern
I uploaded a new (not topmost) variant of HEXTUTOR
http://web.utanet.at/schw1285/KESYS/HEXTUTOR.zip
thanks. :)
Is it allowed to use this code and build on it?
Sure, as long you don't use the overlay file (disass.com)
in any commercial way, you don't need to buy a licence.
HEXEDIT64 disassembler
disass.com is not freeware nor shareware and
any commercial use of it is prohibited.
And if you mail me where and how you like to use it,
I'll add your name to the granted users list, that's all.
But be aware of a few bugs in the DEMO-version.
You may find LOAD/SAVE routines partially in the source already.
I ment for study. I have no immediate use for it. But I suppose
it allready contains all the encodings transformations in an easy
accessible way, so maybe I can find a use for it later on.
What are the "bugs" to be aware of then?

I see that you call D$KeSys. I suppose this is the image of DisAsm.com that
you call to? I ment, if you can do it, so can I right?
I could maybe want to use it for just as an encoder, decoder. But for
experiments
only, in forseeable future. What other things can it do??

Disass.com is 64K. Thats half of kesys??? so, tell us, what other nice
things can it do?

I wanted sometimes todo the same thing. Write code that could be loaded
into memory and called just like you do Kesys. (Scripting)

How about work derived from using it? Is that not allowed for commersial
use either?
I have no immidiate plans for any commersial use. But say, who knows of
the future. You know, I allways be publishing GPL code. But theres no
restriction for GPL to not be commersial, at the same time as GPL. And in
the case I ever could find a use for it, that was "commersial" then I
could pay the licence fee then, right?
I dont even know for certain if the two could legally mix. But I guess
that a derivation, the information extracted from the use of disass.com,
say for instance an encoding table, could be included legally in a GPL
distribution.

Dont know. Dont know of any imidiate use. I just ask, because I am sure
that if I started with it... then soon I would like to be able to include
it...
Post by Wolfgang Kern
__
wolfgang
--
Wolfgang Kern
2007-04-09 18:21:13 UTC
Permalink
Post by /\\\\o//\\annabee
I ment for study. I have no immediate use for it. But I suppose
it allready contains all the encodings transformations in an easy
accessible way, so maybe I can find a use for it later on.
What are the "bugs" to be aware of then?
I'm not sure for the last upload has already the corrected
MOV CRn,reg |MOV reg,CRn (it was opposite here) beside a
few wrong syntax related size-cast terms and some missing
extended disass-info flags.
Post by /\\\\o//\\annabee
I see that you call D$KeSys. I suppose this is the image of DisAsm.com
that you call to?
Yes, but with an offset.
Do not forget/remove the inital byte after load disass !
This is the return to windoze code, otherwise you mess up your stack.
Post by /\\\\o//\\annabee
I ment, if you can do it, so can I right?
Sure, but you'll need the details on the returned structure.
Even the struct span just 0180h bytes (overwrites the DOS-header),
the list is long because many single bits got its dedicated function.
Post by /\\\\o//\\annabee
I could maybe want to use it for just as an encoder, decoder.
But for experiments only, in forseeable future.
What other things can it do??
Disass.com is 64K. Thats half of kesys??? so, tell us, what other nice
things can it do?
Actually it is only 20KB (05000 exact)'large' the rest contains
just test-patterns and NOPs (so it's zip becomes very small).

It can return all value-tracking and codeflow info required for
automated function analysis when configurated that way.
But 'my long break' interrupted the work on the analyser, so neither
the DEMO nor my current working copy is finished in this aspect yet.

in the current form it returns nothing, it just fills its own buffer:
* ASCII address field
* ASCII opcode hexdump (space padded)
* ASCII-source line (with many options)
* operands and operations involved, incl.regNrs,addresses and values
* and opcode size of course

So I can use the ASCII-part as one single disassembler line,
or just the parts I'm intersted in. Optionally it skips the
TEXT-job to work just as info for exceptions, debugger or analyser.

And what you may see anyway:

* flags affected and/or tested
* codeflow changes (incl. reason, condition-quest and target address)
* stack change (altered ESP by value)
* CPU-type and SSE validity (not for all instructions yet)
* privilege dependencies (PL, IOPL)
* redundancy (may indicate bugs or 'no code')

* Many options for output behaviour
* like four size-cast sets (I couldn't refuse the joke for MASM)
* four condition naming sets
* four default syntax choices
* and more
Post by /\\\\o//\\annabee
I wanted sometimes todo the same thing. Write code that could be loaded
into memory and called just like you do Kesys. (Scripting)
My script-language produce nothing else than token/parameter-strings
and doesn't contain any binary code (almost like BASIC or windoze).
So I can make sure that any user will ever crash my system.
But this is another story anyway (G-style drag&drop work in progress).
Post by /\\\\o//\\annabee
How about work derived from using it?
Is that not allowed for commersial use either?
If you write yourself a tool which uses my disass and you can make
money out of it, then I wont care. But you are not allowed to sell
or distribute this tool w/o a signed contract/license then.
Post by /\\\\o//\\annabee
I have no immidiate plans for any commersial use. But say, who knows of
the future. You know, I allways be publishing GPL code. But theres no
restriction for GPL to not be commersial, at the same time as GPL. And in
the case I ever could find a use for it, that was "commersial" then I
could pay the licence fee then, right?
Yes, I'm looking forward to (feed) participate on your work :):)
Post by /\\\\o//\\annabee
I dont even know for certain if the two could legally mix. But I guess
that a derivation, the information extracted from the use of disass.com,
say for instance an encoding table, could be included legally in a GPL
distribution.
Dont know. Dont know of any imidiate use.
I just ask, because I am sure that if I started with it...
then soon I would like to be able to include it...
Let's see how far you gonna step into the shit :):):)
And then we ask Guga how to proceed best ...

__
wolfgang
T***@terra.es
2007-04-06 11:40:08 UTC
Permalink
Post by cr88192
this one primarily targets in-memory compilation.
at present, I am not aware of other particularly similar projects.
You are not aware, but there are dozens of assemblers with many kind
of features
, i know two that can do in-memory compilation:
Octasm for OCTA OS and DOS
Fasm version for OCTA operative system.
And this can be done for other open source assemblers in other oses
too.
http://www.programmersheaven.com/zone5/mh2.htm
r***@cs.ucr.edu
2007-04-07 20:29:49 UTC
Permalink
Post by cr88192
any comments?...
The problem with your semicolons is *not* that you use them to end
statements. You're on the right track there. Most modern HLLs use
semicolons for this purpose and anyone coming from a HLL will
immediately be comfortable with such usage.

The problem is that you use semicolons for *both* statement
terminators and for comments. That's going to make the code hard to
read. As you're obviously adapting things from HLLs, I suggest that
you use C-style comments (as, say, Gas and HLA do) to avoid the
ambiguity.

Oh, and don't listen to a single work about other people telling you
*not* to use semicolons or how you're reinventing the wheel, etc.,
etc.. Most of the people commenting strongly against your solution
have their own assembler and a vested interest in having you do it
*their* way rather than whatever way you want. None of them will
*ever* use your product, so why try and take the time to please them.
This applies to me as well. Don't take my suggestions serious; I do
have a vested interest in having you make your product look like mine
and I will never switch to your product. So please yourself first. And
once you have an actual user base, concentrate on pleasing them. As
for opinions around here (and, to a lesser extent, CLAX), I'd warn you
that the advice you're typically going to get is "make it look like
the assembler that I'm using." There is no need to make your product
like like someone else's -- those other products already exist. Follow
your heart and do what you think is right. After all, if RosAsm can
attract a few users, whatever you come up with will certainly do as
least as well.
hLater,
Randy Hyde
cr88192
2007-04-07 23:45:29 UTC
Permalink
Post by r***@cs.ucr.edu
Post by cr88192
any comments?...
The problem with your semicolons is *not* that you use them to end
statements. You're on the right track there. Most modern HLLs use
semicolons for this purpose and anyone coming from a HLL will
immediately be comfortable with such usage.
The problem is that you use semicolons for *both* statement
terminators and for comments. That's going to make the code hard to
read. As you're obviously adapting things from HLLs, I suggest that
you use C-style comments (as, say, Gas and HLA do) to avoid the
ambiguity.
well, I used C-style comments initially.
either style will work in my parser, so I guess it is up to the coder which
to use.


as for semicolons, they are optional (unlike, say, in C).
actually, I view using them all the time (like in C) to be bad style
anyways.
Post by r***@cs.ucr.edu
Oh, and don't listen to a single work about other people telling you
*not* to use semicolons or how you're reinventing the wheel, etc.,
etc.. Most of the people commenting strongly against your solution
have their own assembler and a vested interest in having you do it
*their* way rather than whatever way you want. None of them will
*ever* use your product, so why try and take the time to please them.
This applies to me as well. Don't take my suggestions serious; I do
have a vested interest in having you make your product look like mine
and I will never switch to your product. So please yourself first. And
once you have an actual user base, concentrate on pleasing them. As
for opinions around here (and, to a lesser extent, CLAX), I'd warn you
that the advice you're typically going to get is "make it look like
the assembler that I'm using." There is no need to make your product
like like someone else's -- those other products already exist. Follow
your heart and do what you think is right. After all, if RosAsm can
attract a few users, whatever you come up with will certainly do as
least as well.
hLater,
Randy Hyde
yes.

my case, I am ending up supporting a lot of variation (some of my older
partial gas-hybrid syntax, variations on nasm-style syntax, ...).

my parser is a little ugly, oh well...


ie, closer to how it would have looked earlier on:

/*
C style comments
some file-related comments, ...
*/

.text .a32

_foo: //yeah
push ebp; mov ebp, esp //this was always this way
...
mov ecx, [ebp+8]
dec_r ecx
jz_b .l0
...

.l0:
mov esp, ebp; pop ebp; ret


atually, initially I used ';' a lot more heavy for the first JITer as well,
and would tend to 'gensym' jump targets (produces much less nice looking
labels).

currently, gensym produces symbols looking like 'BASM$<seq>'. so, like
"BASM$15", "BASM$131", ...

if I ever support generating object files or there is possibility that code
from different runs will be mixed, I will probably have to rip out such
symbols.

in some other places in my projects (for example in file naming), I have
dealt with problems by usually using a timestamp (represented in hex) as a
qualifier. beyond this, I have typically used GUIDS/UUIDS (sometimes
compacted some, ie, via modified base-64 or similar).

just yeah, one does not need a name longer than is needed (needlessly
wasting memory and similar).
Continue reading on narkive:
Loading...