I am in accord with the philosophy of
Unicode. Highest priority for what a character means and looks
like is given to how it has been used in print since long
before the days of computer programming. Relatively little
consideration is given for how programmers may have used it
more recently. For the most part, I am on board with this
philosophy. This even affects how some ASCII characters are
(re)interpreted at the likely dismay of many modern day
programmers. For example, the asterisk ‘*’ is not the multiply operator
nor does it sit near the text's base-line like the ‘+’ does.
Instead it is used to indicate footnotes and is raised above
the base-line like a super-script should be. The proper
multiply operator (which most mathematicians unfortunately
omit in their expressions) is what we all learned in
elementary school. The times ‘×’ operator looks something like
the Roman letter ‘x’. My first programming experience was with
the Radio Shack TRS-80. I was immediately puzzled why they
used the asterisk for multiply instead of the correct one. It
took me years to realize why that was. It was because the team
that created ASCII left it out. Programmers just recruited the
asterisk for the purpose because it was what they had. A
similar thing can be said about the slash ‘/’ and the divide
symbol ‘÷’. Many of my readers don't realize that the ASCII
‘-’ is defined by Unicode to be the hyphen and not the
subtraction operator ‘−’ which is not in ASCII either. But the
hyphen is what virtually all modern programming languages use
today as the subtraction operator.
My intent is to make extensive use of ASCII
characters in places where they make sense with a strong
influence on what is familiar. But changes had to be made for
practices that I thought were bad. Those include some of the
ones I discussed above. I didn't want to get carried away with
using new symbols willy nilly. I only wanted to define a new
character when I thought it would really be beneficial. And
new symbols must look distinctly different enough from all
others that are used in the system so that they are not
confused. It is preferable to use a symbol that Unicode has
already defined rather than creating a new one. There is a
procedure for defining new Unicode symbols but requests are
likely to be rejected or put off if it is not a symbol that
has already been in use in one of the hundreds of existing
languages. Unicode provides a “private area” for defining new
symbols for anyone's private use. But these will not display
properly when the viewer's computer does not have a font
installed to display it. Many web pages use forms to store
text that do not support user defined fonts. So it is greatly
preferable to use Unicode symbols that are already in
existence so that any reader will see something that bears
some resemblance to the glyph that I intend for them to see.
An example pair of characters that I have adopted are the
Record Braces ‘⸢’ and ‘⸣’. The common glyphs used for these
two characters are typically close enough to know what I am
referring to when displayed with the default fonts on third
party web pages. But they look precisely like they are
intended when displayed using the PhiBASIC font which can be
downloaded from the home page of this website.
I avoid using some of the features of
Unicode because of their ill side effects. Some of the Code
Points are not characters in themselves but they modify ones
that come after them. These are called “Combining Characters”
and they are never used in ϕText source code syntax. They can
be enclosed within string literals but their behavior is not
guaranteed to be what you would expect. Even though they are
not displayable characters in themselves, they still require
storage space and so they reduce the number of characters that
can be stored in a line. ϕEdit puts a limit on line length of
255 characters. The number of characters that will be shown on
a line will be reduced by one for every combining character on
that line. When using them, ϕEdit, in its current form, will
fail to place the cursor where it belongs on the line so they
should just be avoided in source code. Another problem feature
is character sets that print from right to left instead of
left to right. Examples are Hebrew and Arabic. Even Windows'
GDI seems to get confused when printing strings that mix
Hebrew with Roman for example. One day the technology may
become more dependable and my understanding may become good
enough that the two can be intermixed. But until that day
comes, I will just avoid using them in any of my source code.