Math for Engineers
This file documents the Math for Engineers work I've been doing with Squeak.
To just jump into it, filein the following files IN THE FOLLOWING ORDER.
MathDD1.cs
MathDD2.cs
PointDD1.cs
PointDD2.cs
CollectionDD.cs
CollectionStatistics.cs
CollectionReindexed.cs
StringsAsNumbers.cs
Background
My education is actually as a Mechanical Engineer. I was fortunate enough
however to get involved with Smalltalk early on in my undergraduate years
and found that the ST environment was ideal for an engineer. I had done
quite a bit of Fortran and a little of C and was just amazed at how easy
it was to transcribe engineering ideas into simulation and problem solving
using the Smalltalk environment. One of the most powerful things was being
released from the doldrums of typed number systems. Especially for the
everyday little engineering tasks. It is not every day that engineers sit
down and simultaneously solve systems of 47 equations, or other scary matrix
stuff, yet this is all most "math libraries" ever seem to do for you. It
was so nice to sit down in Smalltalk with a little workspace open a file
to parse out some numbers and do some simple calculation (like computing
the standard deviation) without having to compile and link a program. Where
I really began to fall in love with ST, was when I realized how easily
I could extend the mathematical framework.
The work contained in these change sets is a more refined version of
snippets I have been carrying around with me at school and two companies
in VisualWorks. In fact, much of it was ported from VisualWorks (yes, I
will make a VW version of the Squeak polish available soon).
Double Dispatching
The first change I had to make to the Squeak environment was to make it
use double dispatching for the four basic mathematical operations (+,-,/,*).
Double dispatching is a pattern or mechanism by which an object receiving
a message can further refine the semantics of the message by turning around
and sending a message back to the argument with type information embodied
in the message. For example, using double dispatching, the method Float>>+
will look something like:
+ aNumeric
^aNumeric sumFromFloat: self
Different kinds of numerics (Integer, Fraction, Float, etc.) can then implement
the sumFromFloat: message in a way that is appropriate for that combination
of numeric types.
Two change sets actually make the change to the system. They are MathDD1.cs
and MathDD2.cs. These two need to be filed in before anything else,
and in that respective order. I had to do two change sets, because the
first one puts in all of the double dispatching methods, the second one
actually changes the core math methods to use them. Two not have completed
the first in its entirety would create a situation where adding two integers
raised a message not understood --- not a good thing.
Points
The next thing I did was to change Points to take advantage of the double
dispatching methods. The next two change sets are PointDD1.cs and
PointDD2.cs. In addition to making points interface with single
numerics using DD, I also cleaned up many of the derivative methods, such
as // and \\ and quo: and rem: and etc. The original implementations kind
of assume that Point is at the top of some sort of coercion hierarchy,
and that was not what I had in mind.
Collections
Up till now, nothing in your system has really changed. Now the fun begins.
When doing a lot of engineering work, I noticed that it was a very common
pattern that I would take two like sized collections and add (or multiply,
subtract, divide) their respective elements. At first in my naivety, I
created a subclass of Array called NumericArray. I wrote the appropriate
asNumericArray and asArray messages and proceeded to litter my code with
typecast like messages. When I discovered that it's quite inefficient to
copy large arrays just so you can send a different message to it, I got
smart and made NumericArray a wrapper like object (this got rid of NumericCollection,
and numerous Numeric* thingies that had followed as well). Still I found
myself putting these cast like methods everywhere. So I said heck with
it and added the numeric protocol to collections and sequenceable collections
and made them DD with other numeric objects. Here's some examples:
#(1 2 3) + 4.0 -> #(5.0 6.0 7.0)
#(3 2 1) * #(4 5 6) -> #(12 10 6)
4@1 / #(2 4) -> #((2@(1/2)) (1@(1/4))) "not really legal syntax"
2 - #(-2 0 2) asSet -> #(4 2 0) asSet
#(5 6 7) - #(1 2 3) asSet -> Error "cannot mix unordereds with anything but single numbers"
To get this stuff, file in the CollectionDD.cs change set. At least
at my places of employ and school, I have found this to be immensely beneficial,
in particular because you can build more complex algorithms on them. Which
brings us to...
Statistic Type Stuff
Which is very easy to build up and nice to have in conjunction with the
numerical stuff. The CollectionStatistics.cs change set adds things
like max, min, and sum to collections. Once you have these, then methods
like range, average, deviation, variance, etc. get easy to add. Take for
example plotting a collections of x and y values in a given box on the
screen. Sooner or later, you have to compute the actual pixel points from
the x and y values. Using the collection stuff, it might look something
like this:
(xVals - xVals min / xVals range) @
(1 - (yVals - yVals min / yVals range))
* box extent + box origin
Which is much more concise than the loops and vars one might expect otherwise.
Reindexing Collections
One of the things that we've found ourselves doing a lot of lately is wanting
to enumerate over every nth element of a sequence. This is particularly
applicable in image processing where colors are usually stored as repeating
RGB values. For example, we may want to compute the standard deviation
or average of the red content of an image. ReindexedCollections, are collection
wrappers which change the indexing scheme. The actually abstract and genericize
the concept of reverseDo:. We don't send that message any more, we just
send something like:
(sequence by: -1) do: [:each | ...work, work, work...]
The change set CollectionReindexed.cs has this stuff in it.
Perl Too (OK, not completely)
It has been my experience that when you add rather generic behaviors to
rather standard objects, that some people freak out and start tirading
about errors being harder to detect because there "one off" type errors.
In the seven years of evolving this simple stuff, I've only had that happen
once. It occurred because I had read a number string from an input field
and forgotten to convert it to a number before mathematically combining
it with another number, which resulted in the characters being modified
because Strings are SequenceableCollections.
I remember when someone was showing me Perl once, I thought it was really
cool that you could add a string representation of a number with a normal
number, and get the resultant number. It seemed ideal for numeric parsing/scripting.
So I've done something like that in StringAsNumbers.cs to avoid
ever having the above problem again. Double dispatching is used to actually
treat the number as it represents and mix it with other numeric types (or
other strings!). It's actually kind of cool, because I don't really worry
about converting strings I've read if all that I'm going to do is start
crunching numbers with them, because that will happen automatically.
#('3' 4 '5.0') + '1' -> #('4' 5 '6.0')
The only thing that the StringsAsNumbers change set is lacking in is that
it uses the default asNumber message for Strings to extract the number.
Unfortunately, 'abc' asNumber will return zero rather than raising a conversion
error which for this case, I think it should. Eventually, I will get around
to writing an asNumberOnError: message to use. But I was kind of waiting
to see what would come in the way of exception handling in the future.
Future Work
This is not the end. I do have quite a few other things I plan on doing
in the future, such as:
-
Getting the VW version polished up and released as well.
-
Using the DD framework to implement typed arrays (FloatArray, IntegerArray,
etc.) and then write primitives for some of the common combination (e.g.
FloatArray + FloatArray). For Squeak, I am waiting on Andreas Raab's work
on pluggable primitives.
-
MeasuredValues - was a project that Ken Greene and I worked on at Siemens
which were numeric objects which had units of measure attached to them.
The underpinning mechanics were kind of elegant. Every known unit type
was expressed with about 7 unique classes and MVs auto-magically reduced
themselves under multiplication/division.
-
I may do some matrix stuff, but I do real heavy matrix stuff so much less
than I do 1 dimensional vectors, that it's not a high priority.