the semantic db

The semantic db project.

6/5/2014 update: I have a whole bunch of .sw here.

Some simple examples (combining the semantic data and slightly modified list comprehension/set builder notation):
(a hint of things to come really. At the moment the English is hand translated to set-builder, but eventually it would be nice to get a bot to do that too!)

Alternately, if we have a full list of members of the Democratic-Republican's, we could instead do:
|answer> => |_self><early US Presidents: _list||political party: Democratic-Republican: _list>

Or, if we make use of the set intersection function (which is a useful thing indeed):
|answer> => intersection(|early US Presidents: _list>, |political party: Democratic-Republican: _list>)

"How many siblings does George have?"
|answer> => count: siblings |person: George>

If we have data on George, Ed and Travis's friends we can do:
"Which friends do George, Ed and Travis have in common?"
|answer> => intersection(friends|person: George>, friends|person: Ed>, friends|person: Travis>)

-- which BTW is a common pattern:
|answer> => intersection(op|U>, op|V>, op|X>, op|Y>)

"Which actors do movie name-a and movie name-b have in common?"
|answer> => intersection(actors|movie: name-a>,actors|movie: name-b>)

Related: mindpixels. (a hint really of how much trivia is stacked into the average adult human brain)

First basic implementation.
Work on a second implementation.
Third.
next.
slowly getting there.

OK. Really starting to get some results now!
the code
a bunch of test cases (scroll right down to the bottom)
friends example
US presidents example
beginnings of a train of thought example
breakfast menu

Next iteration:
the mechanics
the functions
some test cases
play with context
quick test of learning indirectly
tidied up friends example
quick test of apply_op and switching context
tidied up early US presidents example
first play with inverse rule code
US presidents data before creating the inverses, and after.
Using the newly written apply_op_multi() to a fairly simple cyclic network structure. It creates some interesting patterns! one, two, three.
Simple molecule data before creating the inverses, and after.
I tweaked my code a bit, and now I have a couple more almost fractal like patterns.
Say I started with: O |x> => |y>
Well, the code ignored the coefficient, so that O 3|x> returns |y> instead of 3|y>
So that generated one family of patterns, and now that O 7|x> returns 7|y>, we have four and five.
Here is the graph of the (slow) convergence of O^k |a> with respect to simm(), of the above cyclic network (see the very bottom of the page).
Updated the train of thought example. Now that I have the inverse generator, the train of thought code no longer runs into dead-ends, especially for small data-sets, since now everything links to something. Instead it just goes round and round in circles. I suspect that happens to humans sometimes too :) Here is the train of thought code applied to early US presidents.
Tided up breakfast menu example.
Now to upscale it! For a start, lets feed in the Moby thesaurus. The code that processes this (after \r have been deleted). Dump universe before the inverse, and after. And the small test thesaurus made from the first 1000 lines of the full Moby thesaurus.
First play with a simple form of pattern recognition, using simm().
Using the train-of-thought function to do a random-walk of a grid (defined using BKO, of course). Very slow though! But speed is not the point, the point is it is a very general way to step randomly through some linked structure. Either a convoluted network such as the brain, or a simple rectangular grid. Examples: one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen.
New version of the grid example code.
A simple test of Boolean set union, intersection, complement, and delete.
Very early attempt at explaining the code.
First beginnings of a semantic agent console.
A play with the cliche machine (making use of the pattern recognition code). Cliche train of thought (good). Cliche dump universe.
A quick demonstration of how quickly coefficients can decay (and hence an advert for formal logic, which will not suffer coeff decay no matter how far from the initial preface).
Slightly tidied up version of the thesaurus processing code. Here is the dump universe (before creating inverse) of the test thesaurus.
Next, I decided to write code to build random profiles for bots. So with this in mind I needed some pieces.
Code to save a lot of typing for learn rules for lists of objects (of the same type).
Here is a dump universe resulting from that.
Code to generate personality profiles. eg, one, two, three, four.
First beginnings of code (the_semantic_db_processor.py) to map the BKO language (yet to find a name for it!), to python. We use this little thing to quickly test it.
First attempt at a grammar for the BKO language.

19/1/2014: WOOT!! I now have a basic console (at the moment tacked on to the bottom of early US presidents, because it has a large data-set we can play with), that can interactively process and run the basic BKO language. Though, so far only simple operators are supported. The key code for all of this is in the semantic db processor. Which in turn was written using the rules of the game.
Here is a sample transcript where I play with the interactive console.

23/1/2014: OK. I now have multi operators, eg: op^n |x>, whether op is a literal or a function operator. And I have compound operators, eg: similar[president-era] |Adams>, or rescale[100] |x>. And I have the mixed case working too, eg: mult[5]^3 |x>.
Here is the semantic db processor.
BTW, thinking of calling the BKO language Feynman.

And again, this time with strange_int_list(x) too.
Dump universe before creating inverses, and after.
And now we have that data, we can filter out individual facts.
eg: $ grep "^strange-int-list " dump_universe_2000_small_numbers_v2.txt produces this interesting beast.
Now, integers below 10000.
And filter this to the strange-int-list lines.
And using a console we can do things like: reverse coeff-sort similar[strange-int-list] |x>

First play with the frequency class equation. See WP.
The code for this is at the bottom of the semantic-db-functions.

30/1/2014: Finally implemented the almost mythical "Map to Topic". I've been intending to write a python version for so long now (previous version was shell/awk). Good to see I finally have it.
Once again, the code is tacked on to the bottom of the semantic-db-functions.
Here is the code to play with MtT.
It is interesting that structure wise, it looks remarkably like the context.pattern_recognition() code.
Indeed, the core of pattern_recognition is simm() and the core of MtT is nfc(), and these have the property:
simm: superposition + superposition -> float in [0,1]
nfc: ket + superposition -> float in [0,1]
In both cases, 0 for no match, 1 for exact match, and values in between otherwise.
(Note, any metric that has known lower and upper bounds can be easily tweaked to become float in [0,1])
Here is a sample transcript where I play with MtT.
Code to map short phrases to topics.
Here is a sample transcript (though could probably do with some tweaking yet).

OK. Creating a frequency list just using read_text() works, but is very slow. Some terrible big-O in fact. The problem is that each time you add a new ket the code checks the entire superpostion to see if it is already there. Over and over for each new word you add!
So I fixed that by creating a short shell script that converts text into frequency lists once. No need to do it over and over at each run-time. Then wrote some python to load up the frequency list (easy). Usefully, we can later use this as a way of loading up any kind of superposition. Just have the text file in format ket_label[tab]ket_coeff (one per line).
New and faster short-phrases-to-topics (and it really is dramatically faster).
Sample transcript.
Here is another. I really think it is spitting out some amazing results!
Here is the same again, this time just showing the input words and the output results.

Decided to use the Rambler algo (which I call the ngram-stitch) to create novel, almost English sounding words (and names).
Examples: Shakespeare, female names, male names, I Robot, Tom Sawyer, Frankenstein, Sherlock Holmes, Ulysses, Moby Dick.
The results aren't super great, though the female names are probably the best of the bunch.
However, the results are better than throwing random letters together.
Then the next thing we can do is try and create inverses (of a sort). Given a gibberish word, try and guess its meaning.
Method 1, use a tweak of the Map-to-Topic. Results.
Method 2, use simm/pattern_recognition. Results.

Quick look at extracting unique words from text. ie, words that are in one text, but not in any of the other texts you are considering. For now just a handful of ebooks.

$ wc -l *.txt
   316 Alice in Wonderland.txt
   828 Frankenstein.txt
  1889 I Robot.txt
  7910 Moby Dick.txt
 38287 Shakespeare.txt
  1619 Sherlock Holmes.txt
  1815 Tom Sawyer.txt
 52664 total

I think there is a bug in not handling dashes neatly enough. Also would be nice to look at unique word pairs, and maybe triples too.

Improved the code. Now it extracts unique n-grams.

$ wc -l *
271 Alice in Wonderland.1.txt
27857 Alice in Wonderland.10.txt
6793 Alice in Wonderland.2.txt
18837 Alice in Wonderland.3.txt
25345 Alice in Wonderland.4.txt
27202 Alice in Wonderland.5.txt
27651 Alice in Wonderland.6.txt
27763 Alice in Wonderland.7.txt
27810 Alice in Wonderland.8.txt
27836 Alice in Wonderland.9.txt
697 Frankenstein.1.txt
75341 Frankenstein.10.txt
20938 Frankenstein.2.txt
56251 Frankenstein.3.txt
71791 Frankenstein.4.txt
74860 Frankenstein.5.txt
75252 Frankenstein.6.txt
75308 Frankenstein.7.txt
75326 Frankenstein.8.txt
75334 Frankenstein.9.txt
1695 I Robot.1.txt
70307 I Robot.10.txt
25501 I Robot.2.txt
55359 I Robot.3.txt
67637 I Robot.4.txt
69818 I Robot.5.txt
70170 I Robot.6.txt
70257 I Robot.7.txt
70284 I Robot.8.txt
70297 I Robot.9.txt
5461 Moby Dick.1.txt
216522 Moby Dick.10.txt
79483 Moby Dick.2.txt
171928 Moby Dick.3.txt
208439 Moby Dick.4.txt
215056 Moby Dick.5.txt
216193 Moby Dick.6.txt
216416 Moby Dick.7.txt
216478 Moby Dick.8.txt
216506 Moby Dick.9.txt
35723 Shakespeare.1.txt
922531 Shakespeare.10.txt
347982 Shakespeare.2.txt
748365 Shakespeare.3.txt
892318 Shakespeare.4.txt
915498 Shakespeare.5.txt
919727 Shakespeare.6.txt
921198 Shakespeare.7.txt
921963 Shakespeare.8.txt
922355 Shakespeare.9.txt
1234 Sherlock Holmes.1.txt
105912 Sherlock Holmes.10.txt
26212 Sherlock Holmes.2.txt
72755 Sherlock Holmes.3.txt
98566 Sherlock Holmes.4.txt
104534 Sherlock Holmes.5.txt
105584 Sherlock Holmes.6.txt
105814 Sherlock Holmes.7.txt
105879 Sherlock Holmes.8.txt
105901 Sherlock Holmes.9.txt
1011 Tom Sawyer.1.txt
71496 Tom Sawyer.10.txt
21822 Tom Sawyer.2.txt
54309 Tom Sawyer.3.txt
68334 Tom Sawyer.4.txt
70931 Tom Sawyer.5.txt
71354 Tom Sawyer.6.txt
71444 Tom Sawyer.7.txt
71477 Tom Sawyer.8.txt
71489 Tom Sawyer.9.txt
12105988 total

9/2/2014: Moved Map-to-Topic to the new_context class, just below c.pattern_recognition. They are brothers afterall.
Wrote a n-gram version of map to topic. So instead of looking up a frequency list of words, look up a frequency list of ngrams.
Results are not that great. If you don't get the quote from a text exactly right, you get an empty ket as a result! But if you do get it right, you get a higher percentage match than from the 1gram (single word) version. If you had the computing power you might merge them, and do something like: 80% 1-gram + 18% 2gram + 2% 3gram
Some sample comparisons here.
I guess another tweak on 2-grams is this. Say you have the sequence A B C D E.
Normally 2grams return (A,B) (B,C) (C,D) (D,E)
A variation would be: (A,C) (B,D) (C,E).
So if you mixed this in with normal 2grams the result would be a little less "brittle".
So perhaps: 80% 1gram + 10% 2gram + 8% 2gram' + 2% 3gram.
But currently I don't think it is worth the effort to implement.
OK. I implemented the 80% 1gram + 20% 2gram idea. Heh, results aren't that great. Not much of an improvement on just 1gram's it seems. Though maybe there are cases where it gives better results.
Here is a variation where we look for a full ngram match first.

13/2/2014: So far I haven't explained any of the theory of the underlying model of the above code.
I'm not yet going to go into full details but I will say a little.
The Mat-Sum-Sig model (matrix-sum-sigmoid) is a simplified model of neurons.
Then the BKO model (bra-ket-operator) is in turn a simplification of the MSS model.
My proposition is that all relatively static knowledge can be represented using the BKO model.
And since at its heart BKO is a representation for networks, all relatively static knowledge can be represented using networks.
Here are a couple of simple examples:

1) a quite literal network:
O |a1> => |a2>
O |a2> => |a3>
O |a3> => |a4>
O |a4> => |a5>
O |a5> => |a6>
O |a6> => |a7>
O |a7> => |a8>
O |a8> => |a9>
O |a9> => |a10>
O |a10> => |a1> + |b1>

O |b1> => |b2>
O |b2> => |b3>
O |b3> => |b4>
O |b4> => |b5>
O |b5> => |b6>
O |b6> => |b7>
O |b7> => |b1>

Here is a diagram of that network.
Using O to propagate around the network: "currency" not preserved, "currency" preserved.
Using train-of-thought to step through the network.

2) the methanol molecule:
molecular-pieces |molecule: methanol> => |methanol: 1> + |methanol: 2> + |methanol: 3> + |methanol: 4> + |methanol: 5> + |methanol: 6>

atom-type |methanol: 1> => |atom: H>
bonds-to |methanol: 1> => |methanol: 4>

atom-type |methanol: 2> => |atom: H>
bonds-to |methanol: 2> => |methanol: 4>

atom-type |methanol: 3> => |atom: H>
bonds-to |methanol: 3> => |methanol: 4>

atom-type |methanol: 4> => |atom: C>
bonds-to |methanol: 4> => |methanol: 1> + |methanol: 2> + |methanol: 3> + |methanol: 5>

atom-type |methanol: 5> => |atom: O>
bonds-to |methanol: 5> => |methanol: 4> + |methanol: 6>

atom-type |methanol: 6> => |atom: H>
bonds-to |methanol: 6> => |methanol: 5>

Here is a diagram of the methanol "network".

17/2/2014: Made some major progress in the last couple of days.
For a start, I now have the context_list class all working nicely. So instead of doing:

c = new_context("a context") -- create a new context
c.learn(...)                 -- learn something in that context
print(c.dump_universe()      -- print everything we know about the current context

we do:

C = context_list("global")
C.set("first context")     -- switch to context if it already exists in the context_list, else create it.
C.learn(...)               -- learn something in that context
C.set("another context")   -- switch to context if it already exists in the context_list, else create it. 
C.learn(...)               -- learn something in that context
C.set("first context")     -- switch back to previous context.
print(C.dump_universe())   -- dump the current context
print(C.dump_multiverse()) -- dump all known context's in the context_list.

The other thing is I finally wrote the code to parse a "molecule of knowledge".
eg: some-op |x> => 3|a> + |b> + 19.24|c>
This now completes the puzzle. Using c.dump_universe() I have been able to save data to a file for a long time now. Now with the parse_rule_line() code (at the bottom of the processor, BTW), I can load the data from a file too!
And now we can start to write a better semantic agent console.

18/2/14: Cool. I now have all the basics in place! (though the code could do with some tidying)
The mechanics. This is all the code to handle our classes: ket, bra, superposition, sigmoids, new_context and context_list.
The functions. This is where we have all the functions that act on kets and superpositions. The intention is to add more of these with time.
The processor. This is the code that parses and processes all the code, including the BKO language, provisionaly called Feynman.
The console. This is the code for interacting with the semantic agent. (dream: it would be nice to have a GUI version of this some day!)
The rules of the game. This is a text file giving a minimalist description of the code.

20/2/14: making some progress towards defining a grammar.

25/2/14: the parser code is really coming along! But I did find a bug that needs fixing.

fish|x> + |a> + |b>

works.

fish|x> + (|a> + |b>)

is broken.
OK. Wrote some code to test out the parser.
To filter out the noise, use:

$ ./check_the_parser.py | grep "#"

To check just for pass/fail, use:

$ ./check_the_parser.py | grep "##"

Decided to write a shell script wrapper for this.

Decided to implement simple algebra in BKO:
27/6/2014 update: I wonder how to implement derivatives?

# a|x> + b|y> => a|x> + b|y>
def algebra_add(one,two):
  return one + two

# a|x> * b|y> => a*b |x*y>
def algebra_mult(one,two,Abelian=True):
  one = superposition() + one  # hack so one and two are definitely sp, not ket
  two = superposition() + two

  result = superposition()
  for x in one.data:
    for y in two.data:
      labels = x.label.split('*') + y.label.split('*')
      if Abelian:
        labels.sort()
      label = "*".join(labels)
      result += ket(label,x.value * y.value)
  return result

# (a|x> + b|y> + c|z>)^|n>
# eg: (|a> + |b> + |c>)^|2> = |a*a> + 2.000|a*b> + 2.000|a*c> + |b*b> + 2.000|b*c> + |c*c>
def algebra_power(one,two):
  one = superposition() + one
  two_label = two.ket().label
  null, power = extract_category_value(two_label)
  try:
    n = int(power)
  except:
    return ket("",0)

  if n <= 0:
    return ket("1")

  result = one
  for k in range(n - 1):
    result = algebra_mult(result,one)
  return result

# implement basic algebra:
def algebra(one,operator,two):    # assume Abelian for now.
  op_label = operator if type(operator) == str else operator.the_label()
  null, op = extract_category_value(op_label)

  if op not in ['+','*','^']:
    return ket("",0)

  if op == '+':
    return algebra_add(one,two)
  elif op == '*':
    return algebra_mult(one,two)
  elif op == '^':
    return algebra_power(one,two)  
  else:
    return ket("",0)

# simple complex number mult:
def complex_algebra_mult(one,two):
  one = superposition() + one  # hack so one and two are definitely sp, not ket
  two = superposition() + two

  result = superposition()
  for x in one.data:
    for y in two.data:
      if x.label == 'real' and y.label == 'real':
        result += ket("real",x.value * y.value)

      if x.label == 'real' and y.label == 'imag':
        result += ket("imag",x.value * y.value)

      if x.label == 'imag' and y.label == 'real':
        result += ket("imag",x.value * y.value)

      if x.label == 'imag' and y.label == 'imag':
        result += ket("real",-1 * x.value * y.value)
  return result

Works pretty well, considering how little work it took. But will almost certainly tweak/improve it in the future.

28/2/2014: A mapping between BKO and a simplified model of neurons.
In this case:

supported-ops |x> => |op: op1> + |op: op2> + |op: op3>
op1 |x> => |a> + |b> + |c>
op2 |x> => |d> + |e>
op3 |x> => |f> + |g> + |h> + |i>

2/3/2014: Improved my elementary algebra code (can be found at the bottom of the functions code, BTW):

# maps ket -> ket
# 3|x> => 3|x>
# |number: 7.2> => 7.2| >  # NB: the space in the ket label.
# 2|number: 3> => 6| >     # We can't use just |> because it is dropped all over the place!
# 8|number: text> => 0| >  # so the maths eqn: 3a + 7
# |3.7> => 3.7| >          # in my notation is 3|a> + 7| >
# 3|5> => 15| >
def category_number_to_number(one):         # find better name!
  one = one.ket()
  cat, value = extract_category_value(one.label)
  try:
    n = float(value)
  except:
    if cat == 'number':                     # not 100% want to keep these two lines
      return ket(" ",0)
    return one
  return ket(" ",one.value * n)

# a|x> * b|y> => a*b |x*y>
def algebra_mult(one,two,Abelian=True):
  one = superposition() + one  # hack so one and two are definitely sp, not ket
  two = superposition() + two

  result = superposition()
  for x in one.data:
    x = category_number_to_number(x)  
    for y in two.data:
      y = category_number_to_number(y)
      print("x*y",x,"*",y)
      labels = [ L for L in x.label.split('*') + y.label.split('*') if L.strip() != '' ]
      if Abelian:  
        labels.sort()
      label = "*".join(labels)
      if label == '':         # we can't have ket("",value), since it will be dropped.
        label = " "
      result += ket(label,x.value * y.value)
  return result

# (a|x> + b|y>)^|n>
def algebra_power(one,two,Abelian=True):
  one = superposition() + one
  two = category_number_to_number(two)
  try:
    n = int(two.value)
  except:
    return ket(" ",0)

  if n <= 0:
    return ket(" ",1)

  result = one
  for k in range(n - 1):
    result = algebra_mult(result,one,Abelian)
  return result

Maybe a quick mention of feedback loops, currency conservation and inverse of operators.
Say we start with these 4 rules:

op1 |x> => |a> + 2.000|b> + 5.000|c> + 2.000|d>
op2 |x> => 0.200|a> + 0.300|b> + 0.500|e>
op3 |x> => 0.100|b> + 0.100|a> + 0.100|c> + 0.200|d>
op1 |y> => 3.000|m> + |n> + 7.000|o> + 2.000|p> + |q>

And note that we have:

count-sum op1 |x> = |number: 10.0>
count-sum op2 |x> = |number: 1.0>          # op2 has currency conservation.
count-sum op3 |x> = |number: 0.5>
count-sum op1 |y> = |number: 14.0>

Well, if we create inverse, we now observe:

inverse-op1 op1 |x> = 10.000|x>
inverse-op2 op2 |x> = |x>
inverse-op3 op3 |x> = 0.500|x>
inverse-op1 op1 |y> = 14.000|y>

So if the rule has currency conservation, as in the op2 case (where count-sum op2|x> == |number: 1>), then the inverse is an exact inverse (surprised I haven't tested this till now).
If the rule does not have currency conservation, then it is either an increasing or decreasing feedback loop.
eg:

inverse-op1 op1 inverse-op1 op1 |x> = 100.000|x>
inverse-op2 op2 inverse-op2 op2 |x> = |x>
inverse-op3 op3 inverse-op3 op3 |x> = 0.250|x>
inverse-op1 op1 inverse-op1 op1 |y> = 196.000|y>

or mixed case:

inverse-op3 op3 inverse-op1 op1 inverse-op1 op1 |x> = 50.000|x>
inverse-op3 op3 inverse-op1 op1 inverse-op1 op1 inverse-op3 op3 |x> = 25.000|x>

Next, provided the operators are disjoint. eg:

intersection(op1|x>,op1|y>) = |>

Then we have this (ie, |x> and |y> are independent and do not "interact"):

inverse-op1 op1 (|x> + |y>) = 10.000|x> + 14.000|y>
inverse-op1 op1 (3|x> + 2|y>) = 30.000|x> + 28.000|y>

I guess just showing that these things are well behaved if you are careful.

26/6/2014 update: I guess what I mean by "well behaved" is:

inverse-op1 op1 (3|x> + 2|y>) = 30.000|x> + 28.000|y> == inverse-op1 op1 3|x> + inverse-op1 op1 2|y>

If |x> and |y> interacted, we would not have this equality.

21/6/2014 update: count-sum can be considered an operator that returns the amount of currency in a superposition.
If sp is a superposition, and op is an operator applied to it, then we have currency conservation if:
count-sum op sp == count-sum sp

Now, here is a little piece towards mapping English to BKO.
Note this common structure:
no X
some X
any X
every X
where X is usually one of one, thing, place, where, how, body.
In BKO these (provisionaly) become:

"no one" => <person: *|op|x> == 0
"no thing" => <object: *|op|x> == 0
"no place" => <location: *|op|x> == 0
"no where" => <location: *|op|x> == 0
"no how" => <...  > == 0
"no body" => <person: *|op|x> == 0

"some one" => <person: *|op|x> > 0
"some thing" => <object: *|op|x> > 0
"some place" => <location: *|op|x> > 0
"some where" => <location: *|op|x> > 0
"some how" => <...  > > 0
"some body" => <person: *|op|x> > 0

Not, sure but maybe this:
"any one" => pick-elt |_self><person: *|op|x>
"any thing" => pick-elt |_self><object: *|op|x>
"any place" => pick-elt |_self><location: *|op|x>
"any where" => pick-elt |_self><location: *|op|x>
"any how" => pick-elt |_self><...  >
"any body" => pick-elt |_self><person: *|op|x>

"every one" => |_self><person: *|op|x>
"every thing" => |_self><object: *|op|x>
"every place" => |_self><location: *|op|x>
"every where" => |_self><location: *|op|x>
"every how" => |_self><...  >
"every body" => |_self><person: *|op|x>

8/3/2014: So, to state them once again, my (bold) hypotheses are:
1) The Mat-Sum-Sig (matrix-sum-sigmoid) model is a simplified model of neurons.
2) The BKO (bra-ket-op) model is a simplification of the MatSumSig model.
3) All relatively static knowledge can be represented using the BKO model.
3.1) BKO is a general representation for networks, so all relatively static knowledge can be represented using networks.
4) All output calculations of the brain or arbitrary neural network can be represented as a superposition of kets.

Today, I used this code to dump frequency lists of names (male, female, last) into .sw format.
Resulting in names.sw.
Allowing us to do things like this (a tidied up transcript asking about names in the console).

19/3/2014: Implemented stored rules today! This is very cool. Took some careful code tweaks, but I got there in the end. Maybe explain later.
Too hard to use words ATM, let's just give a worked example (noting that #=> is the notation for stored rules):

sa: bah |z> => |it worked!>
sa: foo |A> #=> shout bah |z>
sa: bah |A> #=> shout read |text: a couple of words>
sa: dump
----------------------------------------
|context> => |context: sw console>

supported-ops |z> => |op: bah>
bah |z> => |it worked!>

supported-ops |A> => |op: foo> + |op: bah>
foo |A> #=> shout bah |z>
bah |A> #=> shout read |text: a couple of words>
----------------------------------------

sa: foo |A>
IT WORKED!
|IT WORKED!>

sa: bah |A>
WORD: A
WORD: COUPLE
WORD: OF
WORD: WORDS
|WORD: A> + |WORD: COUPLE> + |WORD: OF> + |WORD: WORDS>

I guess a simple description is that you store rules that don't get processed until they get activated. So even running dump does not activate the rules. Only when invoked inside x.apply_op() are they activated.
It also means we can implement "active memories". ie, when you recall a specific memory some side-effect kicks in.
OK. Here is an example. Take a look at the active-network-propagation.sw file:

bah |x> => |it worked!>
o |a> => |b>
o |b> => |c>
o |c> #=> |d> + shout bah |x>
o |d> => |e>
o |e> => |f>

Load that into the console, then observe:

sa: o |a>
|b>

sa: o^2 |a>
|c>

sa: o^3 |a>
IT WORKED!
|d> + |IT WORKED!>

sa: o^4 |a>
IT WORKED!
|e>

sa: o^5 |a>
IT WORKED!
|f>

Where I am using "shout bah |x>" as an for-example side-effect. But really there are no limits on what it could be.
In this case, when we apply the op "o" to |c> the code shouts "IT WORKED!".
And of course, we have to step on |c> to get to |e> and |f>, hence they also shout "IT WORKED!".

A little more on mapping English to BKO.

"Y tastes exactly like X" => taste |Y> => taste |X>
"Y smells a lot like X" => smell |Y> => 0.9 smell |X>
"Y sounds a little like X" => sound |Y> => 0.2 sound |X>
"Y feels a tiny bit like X" => feel |Y> => 0.05 feel |X> 
"Y looks not at all like X" => looks |Y> => NOT looks |X>  -- this one is provisional.

Note that they all correspond to the senses. I am not currently sure if there are similar for non sensory input.
Here is a short example in the console:

$ ./the_semantic_db_console.py
Welcome!
sa: smell |X> => |roses>
sa: -- Y smells a lot like X
sa: smell |Y> => 0.85 smell |X>

sa: dump
----------------------------------------
|context> => |context: sw console>

supported-ops |X> => |op: smell>
smell |X> => |roses>

supported-ops |Y> => |op: smell>
smell |Y> => 0.850|roses>
----------------------------------------

sa: similar[smell] |Y>
0.850|X>

sa: smell |Z> => 0.4|roses>
sa: dump
----------------------------------------
|context> => |context: sw console>

supported-ops |X> => |op: smell>
smell |X> => |roses>

supported-ops |Y> => |op: smell>
smell |Y> => 0.850|roses>

supported-ops |Z> => |op: smell>
smell |Z> => 0.400|roses>
----------------------------------------

sa: similar[smell] |Y>
0.850|X> + 0.471|Z>

sa: similar[smell] |X>
0.850|Y> + 0.400|Z>

sa: similar[smell] |Z>
0.471|Y> + 0.400|X>

Here is something I have been meaning to mention for some time now.
There is an interesting similarity between Feynman diagrams + path-integrals and I guess you could call it "brain-space".
... flesh out details ...
Still having trouble finding words ...
I guess simply enough, a Feynman diagram represents a path-integral/sum-over-histories through space time.

R = <b|op5 op4 op3 op2 op1|a>

In terms of the brain, is a sum over pathways through brain-space.
Inject a signal at |a> (where BTW |a> corresponds to a particular neuron we have given the label "a")
Propagate through brain-space using op5 op4 op3 op2 op1.
Measure the value at |b> using <b|
And the physics bit is that likely it did not take a single pathway from |a> to |b>.
Instead, it took all possible brain-pathways between |a> and |b>
(given time constraints, eg if spikes on a particular pathway reach an hour later, then we probably won't see it.).
I should mention the biggest difference between QM and brain-space is that the coeffs of kets in QM are complex numbers. In brain-space they are positive reals.
Also, propagation through brain-space is often "non-local", depending on how long the underlying axons are.

Also, this:

We can say, given |a>, |b>, |c>, |d> and:
R1 = <b|op2 op1|a>
R2 = <d|op2 op1|c>
R1' = <b|op2 op1(|a> + |c>)
R2' = <d|op2 op1(|a> + |c>)
If R1 == R1' and R2 == R2' then |a> and |c> do not interact.
If R1 != R1' or R2 != R2' then |a> and |c> do interact.

Also, a quick comment:

If <b|some-op-sequence|a> > 0, then we can say there exists a brain-space pathway between |a> and |b>.

And a math proof/chain of logic bears some similarity with a brain-space pathway. Details later!

16/5/2014 update: maybe it goes something like this:
You start with a list of truth's, then you apply an operator (eg a line in a proof) and you have a new list of truths.
Then repeat until your desired object is in the list of truths.
eg:

sa: first-op (|truth-a> + |truth-b> + |truth-c>)
|truth-d> + |truth-e>

sa: second-op first-op (|truth-a> + |truth-b> + |truth-c>)
|truth-f> + |truth-g> + |truth-h> + |truth-i>

sa: third-op second-op first-op (|truth-a> + |truth-b> + |truth-c>)
|truth-j>

sa: fourth-op third-op second-op first-op (|truth-a> + |truth-b> + |truth-c>)
|truth-k> + |truth-l>

sa: fifth-op fourth-op third-op second-op first-op (|truth-a> + |truth-b> + |truth-c>)
|truth-m> + |desired-result> + |truth-n> + |truth-o> + |truth-p>

ie:
<desired-result|fifth-op fourth-op third-op second-op first-op (|truth-a> + |truth-b> + |truth-c>) == 1

Of course, in general, finding the right operators to apply at each level is non-trivial.

A little more on mapping English to BKO:

$ ./the_semantic_db_console.py
Welcome!
sa: context friends
sa: friends |Fred> => |Jack> + |Harry> + |Ed> + |Mary> + |Rob> + |Patrick> + |Emma> + |Charlie>
sa: friends |Sam> => |Charlie> + |George> + |Emma> + |Jack> + |Robert> + |Frank> + |Julie>

sa: dump
----------------------------------------
|context> => |context: friends>

supported-ops |Fred> => |op: friends>
friends |Fred> => |Jack> + |Harry> + |Ed> + |Mary> + |Rob> + |Patrick> + |Emma> + |Charlie>

supported-ops |Sam> => |op: friends>
friends |Sam> => |Charlie> + |George> + |Emma> + |Jack> + |Rober> + |Frank> + |Julie>
----------------------------------------

sa: -- what friends does Fred have?
sa: friends |Fred>
|Jack> + |Harry> + |Ed> + |Mary> + |Rob> + |Patrick> + |Emma> + |Charlie>

sa: -- what friends does Sam have?
sa: friends |Sam>
|Charlie> + |George> + |Emma> + |Jack> + |Rober> + |Frank> + |Julie>

sa: -- how many friends does Fred have?
sa: count friends |Fred>
|number: 8>

sa: -- how many friends does Sam have?
sa: count friends |Sam>
|number: 7>

sa: -- what friends do Fred and Sam have?
sa: union (friends |Fred>,friends|Sam>)
|Jack> + |Harry> + |Ed> + |Mary> + |Rob> + |Patrick> + |Emma> + |Charlie> + |George> + |Rober> + |Frank> + |Julie>

sa: -- how many friends do Fred and Sam have?
sa: count union (friends |Fred>,friends|Sam>)
|number: 12>

sa: -- what friends do Fred and Sam have in common?
sa: common (friends |Fred>,friends|Sam>)                       -- common is an alias for intersection.
|Jack> + |Emma> + |Charlie>

sa: -- how many friends do Fred and Sam have in common?
sa: count common (friends|Fred>,friends|Sam>)                 
|number: 3>

"Mary is roughly 40 years old"

sa: age |Mary> => rescale[1] smooth[1]^2 |age: 40>
sa: dump
----------------------------------------
|context> => |context: sw console>

supported-ops |Mary> => |op: age>
age |Mary> => 0.167|age: 38.0> + 0.667|age: 39.0> + |age: 40.0> + 0.667|age: 41.0> + 0.167|age: 42.0>
----------------------------------------

25/3/2014: Seriously big WOOOT!! today.
I finally implemented general rules. I thought doing that was a long, long way away. But it seems to be working great! (discovered a minor bug, but that is easy to fix).
One common usage (but there are plenty of others) is defining general family structure:

siblings |person: *> #=> brothers |_self> + sisters |_self>
children |person: *> #=> sons |_self> + daughters |_self>
parents |person: *> #=> mother |_self> + father |_self>
uncles |person: *> #=> brothers parents |_self>
aunts |person: *> #=> sisters parents |_self>
aunts-and-uncles |person: *> #=> siblings parents |_self>
cousins |person: *> #=> children siblings parents |_self>
grand-fathers |person: *> #=> father parents |_self>
grand-mothers |person: *> #=> mother parents |_self>
grand-parents |person: *> #=> parents parents |_self>
grand-children |person: *> #=> children children |_self>
great-grand-parents |person: *> #=> parents parents parents |_self>
great-grand-children |person: *> #=> children children children |_self>

BTW, general rules are implemented by putting label_decent() into the context.recall() function.

def label_decent(x):
  print("x:",x)
  result = [x]
  if x == "*":
    return result
  if x.endswith(": *"):
    x = x[:-3]
  while True:
    try:
      x,null = x.rsplit(": ",1)
      result.append(x + ": *")
    except:
      result.append("*")
      return result

eg, if you feed in this label "a: b: c: d: fred", it returns these trail labels:

a: b: c: d: fred
a: b: c: d: *
a: b: c: *
a: b: *
a: *
*

And the key code in context.recall() is:

    match = False
    for trial_label in label_decent(label):
      if trial_label in self.known_kets:
        if op in self.rules[trial_label]:
          rule = self.rules[trial_label][op]
          match = True
          break
    if not match:
      print("recall not found")               
      rule = ket("",0)

With general rules, there are whole bunch of things we can do now!
For example, we can implement aliases.

Celcius |*> #=> C |_self>
Celsius |*> #=> C |_self>

Ditto for Fahrenheit and Kelvin:
Fahrenheit |*> #=> F |_self>
Kelvin |*> #=> K |_self>

sa: roughly |*> #=> rescale[1] smooth[1]^2 |_self>

sa: roughly |age: 40>
0.167|age: 38.0> + 0.667|age: 39.0> + |age: 40.0> + 0.667|age: 41.0> + 0.167|age: 42.0>

And we can neatly implement the concept of generalisations and stereotypes:

"80% of women take too long to get ready"
take-too-long-to-get-ready |person: female: *> #=> 0.8 |yes>

But we can over-ride with:
"Mary doesn't take long to get ready"
take-too-long-to-get-ready |person: female: Mary> => 0.1 |yes>

"70% of men drink too much"
drink-too-much |person: male: *> #=> 0.7|yes>

But we can over-ride with:
"Fred is a tea-totaler"
drink-too-much |person: male: Fred> => |no>

"60% of British have bad teeth"
have-bad-teeth |person: UK: *> #=> 0.6 |yes>

"10% of Americans have bad teeth"
have-bad-teeth |person: US: *> #=> 0.1 |yes>

And we can implement the idea of general rules for plurals, overridden by more specific rules:

sa: plural |word: *> #=> read-letters(|_self> + |letter: s>)
sa: plural |word: apple>
|word: apples>

sa: plural |word: mouse>
|word: mouses>

sa: plural |word: mouse> => |word: mice>            -- let's store the exemption from the rule:

sa: plural |word: mouse>                            -- test it works.
|word: mice>

sa: plural |word: rabbit>                           -- test the general rule again.
|word: rabbits>

sa: plural |rabbit>                                 -- show the importance of the "word: " prefix, since the rule was defined for |word: *>
recall not found
|>

Here is a closely related example:

sa: is-human |person: *> #=> |yes>

sa: is-human |person: Fred>
|yes>

sa: is-human |person: Sam> => |no!>                 -- we don't like Sam.

sa: is-human |person: Eric>
|yes>

sa: is-human |person: Sam>
|no!>

27/3/2014: OK. I have the beginnings of an "active read". The idea is the code hints at understanding what it is reading.
The current read just converts text into words, with no understanding what so ever.

sa: read |text: Two of our famous Belgian Waffles with plenty of real maple syrup.>
|word: two> + 2.000|word: of> + |word: our> + |word: famous> + |word: belgian> + |word: waffles> + |word: with> + |word: plenty> + |word: real> + |word: maple> + |word: syrup>

But with active read we have examples such as:

|text: Two of our famous Belgian Waffles with plenty of real maple syrup.>
becomes:
|number: 2> + |country: Belgium> + |food: belgian waffles> + |food: waffles> + |food: maple syrup>

|text: Light Belgian waffles covered with strawberries and whipped cream.>
becomes:
|country: Belgium> + |food: belgian waffles> + |food: waffles> + |food: strawberries> + |fruit: strawberries> + |food: whipped cream> + |food: cream>

|text: Two eggs, bacon or sausage, toast, and our ever-popular hash browns.>
becomes:
|number: 2> + |food: egg> + |food: bacon> + |food: sausage> + |food: toast> + |food: hash browns>

This is done using the pattern recognition code, and this set of rules:

 |food: waffles> => |word: waffles>
 |country: Belgium> => |word: belgian>
 |food: strawberries> => |word: strawberries>
 |fruit: strawberries> => |word: strawberries>
 |food: berries> => |word: berries>
 |fruit: berries> => |word: berries>
 |country: France> => |word: french>
 |food: toast> => |word: toast>
 |meal: breakfast> => |word: breakfast>
 |food: egg> => |word: egg>
 |food: eggs> => |word: eggs>
 |food: bacon> => |word: bacon>
 |food: sausage> => |word: sausage>
 |food: sausages> => |word: sausages>
 |number: 2> => |word: two>
 |food: cream> => |word: cream>
 |food: belgian waffles> => |word: belgian> + |word: waffles>
 |food: maple syrup> => |word: maple> + |word: syrup>
 |food: whipped cream> => |word: whipped> + |word: cream>
 |food: hash browns> => |word: hash> + |word: browns>

BTW, the code for active read is currently at the bottom of the functions code.
Update: decided to put the code here:

def silent_active_read_text(context,one,pattern=""):
  result = superposition()
  data = read_text(one).data                        # later other functions here too instead of just read_text()
  for k in range(len(data)):
    y1 = data[k]
    result += context.pattern_recognition(y1,pattern).drop_below(1)
                                                                 
    if k < len(data) - 1:                                       
      y2 = data[k] + data[k + 1]                    # this line corresponds to my "buffer" idea. Explain later!
      result += context.pattern_recognition(y2,pattern).drop_below(1)

  return result

27/6/2014 update: OK. I finally got around to writing a console version of active read.
But I'm going to call it active-buffer since it is much more general than just reading text.
The idea is that as you input data (from the "outside" world) you hold it in a buffer and try and pattern match it against what you know.
It is a generalisation of the active_read() idea, and I imagine it will be very useful indeed.
N is the number of elements in the buffer, and if we use short term memory as a guide (eg, 7 +-2 terms), then maybe N <= 7.
Though that depends on how low or high level we are working at. Lower generally implies larger N.
Now, an example:

sa: load next-breakfast-menu.sw
-- first, without active-buffer:
sa: read |text: two of our famous belgian waffles with plenty of real maple syrup>
|word: two> + 2.000|word: of> + |word: our> + |word: famous> + |word: belgian> + |word: waffles> + |word: with> + |word: plenty> + |word: real> + |word: maple> + |word: syrup>

-- now, apply the active-buffer function:
sa: active-buffer[2,1] read |text: two of our famous belgian waffles with plenty of real maple syrup>
|number: 2> + |country: Belgium> + |food: belgian waffles> + |food: waffles> + |food: maple syrup>

-- another example:
sa: read description |food: Homestyle Breakfast>
|word: Two> + |word: eggs> + |word: bacon> + |word: or> + |word: sausage> + |word: toast> + |word: and> + |word: our> + |word: ever-popular> + |word: hash> + |word: browns>

-- now apply the active-buffer function:
sa: active-buffer[2,0] read description |food: Homestyle Breakfast>
2.000|food: eggs> + 2.000|food: bacon> + 2.000|food: sausage> + 2.000|food: toast> + 2.500|food: hash browns>
-- NB: it missed the |word: Two> since the current version of read does not convert to lower-case, and the pattern is:
 |number: 2> => |word: two>  -- ie, lowercase "two"

30/6/2014 update: It occurred to me we can have more than one layer, getting even closer to the code "understanding" what it is reading.

sa: load next-breakfast-menu.sw
sa: description |food: Homestyle Breakfast>
|text: "Two eggs, bacon or sausage, toast, and our ever-popular hash browns">

sa: read description |food: Homestyle Breakfast>
|word: two> + |word: eggs> + |word: bacon> + |word: or> + |word: sausage> + |word: toast> + |word: and> + |word: our> + |word: ever-popular> + |word: hash> + |word: browns>

sa: active-buffer[2,1] read description |food: Homestyle Breakfast>
|number: 2> + |food: eggs> + |food: bacon> + |food: sausage> + |food: toast> + |food: hash browns>

-- now define a pattern:
sa: |food: homestyle breaky> => |food: eggs> + |food: bacon> + |food: sausage>

-- now another layer of active-buffer:
sa: active-buffer[2,0] active-buffer[2,1] read description |food: Homestyle Breakfast>
3.000|food: homestyle breaky>

-- a slightly tweaked version of the same (just jiggling coeffs really):
sa: active-buffer[3,1] active-buffer[2,1] read description |food: Homestyle Breakfast>
|food: homestyle breaky>

So maybe I should try to explain what is going on here.
First, we look up the description of the Homestyle Breakfast. |text: "Two eggs ..."
Then we apply the read operator, which converts the text into a superposition/sequence of words.
Then the active buffer (size 2 buffer, drop-below threshold of 1) looks up patterns of words with operator "".
So now the code partly "understands" what it is reading. It "knows" "two" is a number, "eggs" is a food, "bacon" is a food, and so on.
Now, let's define a new pattern. The foods that define a homestyle breaky.
Apply another layer of active buffer (size 2 buffer, drop below threshold of 0).
Now the code "understands" that "eggs, bacon and sausage" is a homestyle breaky.

3/7/2014 update: Now it is time to give an example where we specify the pattern. In this case "spell".

-- load up the data I prepared:
sa: load spell-active-buffer.sw
-- let's take a look:
sa: display
  context: sw console

  l1
  supported-ops: op:
               : a, b, f, r, o, g, u, v, w

  l2
  supported-ops: op:
               : 2.00 s, g, r, t, a, c, xy, z

  animal: frog
  supported-ops: op: spell
          spell: f, r, o, g

  weather: fog
  supported-ops: op: spell
          spell: f, o, g

  animal: rat
  supported-ops: op: spell
          spell: r, a, t

-- now play with active buffer:
sa: active-buffer[7,0,spell] "" |l1>
6.655|animal: rat> + 14.269|weather: fog> + 17.164|animal: frog>

sa: active-buffer[7,0,spell] "" |l2>
2.977|weather: fog> + 6.155|animal: frog> + 12.899|animal: rat>

Of course, this would work better in terms of using sequences, but we don't have the code for that.
So we are getting matches that are independent of the order of the kets.
eg:

rat is matching "a, b, f, r"  (note the r and a)
fog is matching "f, r, o, g"  (ignoring the r)
and:
frog is matching "g, r"       (blind to order of the kets)
rat is matching "r, t, a"     (blind to order of the kets)

Still works as a fine proof of concept though.
Besides, a sequence version of simm (which I still don't know for sure how to do) would also match on out of order, just with a lower coeff.
So "frgo" would still match with "frog", just with a lower coeff than "frog" with "frog"
Also, use your imagination. There are heaps of uses for the active-buffer beasty!

25/7/2014: A quick note on my thoughts for a sequence version of simm.
I have a couple of ideas. One is to map sequences to superpositions (using some scheme), then using standard superposition simm.
eg, perhaps this scheme (for some constants {c1,c2,c3,c4,c5,c6}):

|a> . |b> . |c> . |d>
maps to:
c1 (|a> + |b> + |c> + |d>) + c2 (|ab> + |bc> + |cd>) + c3 (|ac> + |bd>) + c4 (|abc> + |bcd>) + c5 (|abd> + |acd>) + c6 |abcd>

So a superposition (the first term) acts as a first approximation to a sequence.
Another idea is to commute kets to normalize the two patterns into the same sequence, and then compare, using a tweak on superposition simm.
eg:

|f> . |r> . |g> . |o>
maps to:
|f> . |r> . c |o> . c |g>

Then compare:
|f> . |r> . c |o> . c |g>
with
|f> . |r> . |o> . |g>

How to actually do this in the general case in code, I currently have no idea!

11/7/2014 update: Now an example at the word level (the last one was at the letter level)

-- first load up some knowledge:
sa: |person: Fred Smith> => |word: fred> + |word: freddie> + |word: simth> + |word: smithie>  -- various names and nick-names
sa: |person: Mary> => |word: mazza>                                                           -- just a nick-name
sa: |greeting: Hey!> => |word: hey>
sa: |question: what is> => |word: what's>
sa: |direction: up> => |word: up>
sa: |phrase: having a baby> => read |text: having a baby>
sa: |phrase: in the family way> => read |text: in the family way>
sa: |phrase: up the duff> => read |text: up the duff>
sa: |phrase: with child> => read |text: with child>
sa: |concept: pregnancy> => |phrase: having a baby> + |phrase: in the family way> + |phrase: up the duff> + |phrase: with child>

-- save a copy:
sa: save active-buffer-play.sw

-- now start playing with it:
sa: active-buffer[7,0] read |text: Hey Freddie what's up?>  
2.083|greeting: Hey!> + 1.500|person: Fred Smith> + 2.917|question: what is> + 2.083|direction: up> + 1.250|phrase: up the duff>
-- up the duff is in there because of the word "up"

-- indeed, this shows up if we apply another layer of active-buffer:
sa: active-buffer[7,0] active-buffer[7,0] read |text: Hey Freddie what's up?>
0.988|concept: pregnancy>
 
-- now test phrase matching a concept, in this case phrases that mean pregnant.
sa: active-buffer[7,0] read |text: Hey Mazza, you with child, up the duff, in the family way, having a baby?>
2.593|greeting: Hey!> + 4.186|person: Mary> + 11.586|phrase: with child> + 6.857|direction: up> + 23.414|phrase: up the duff> + 25.000|phrase: in the family way> + 9.224|phrase: having a baby>

-- one more layer of active-buffer:
sa: active-buffer[7,0] active-buffer[7,0] read |text: Hey Mazza, you with child, up the duff, in the family way, having a baby?>
11.069|concept: pregnancy>

27/6/2014: Heh. Judging by the volume of the output in the console when I run this, heh, the big-O is probably big :)
Here is the code in the functions file:

# active-buffer[N,t] some-superposition             -- uses "" as the default pattern.
# active-buffer[N,t,pattern] some-superposition     -- uses your chosen pattern (we can't use "" as the pattern, due to broken parser!)
# eg: active-buffer[3,0] read |text: I want french waffles>
# where: 
# N is an int                                       -- the size of the active buffer  
# t is a float                                      -- the drop below threshold
# pattern is a string                               -- the pattern we are using
def console_active_buffer(one,context,parameters):  -- one is the passed in superposition.
  try:
    N,t,pattern = parameters.split(',')
    N = int(N)
    t = float(t)
  except:
    try:
      N,t = parameters.split(',')
      N = int(N)
      t = float(t)
      pattern = ""
    except:
      return ket("",0)
      
  result = superposition()
  data = one.data
  for k in range(len(data)):
    for n in range(N):
      if k < len(data) - n:
        y = superposition()
        y.data = data[k:k+n+1]                      -- I guess this is the bit you could call the buffer.        
        result += context.pattern_recognition(y,pattern).drop_below(t)
  return result                                     -- .coeff_sort() here?

30/7/2014 update: BTW, in the brain, the buffer is usually not of a fixed size (unlike above where we specify N).
eg, when reading letters trying to determine the word, we keep the buffer open until we meet an end of word char (usually space, but other punctuation chars too).
Or when reading words trying to understand a phrase, or a sentence.
Then another buffer at the paragraph level. Then the page/chapter level and so on.
Another example is having a conversation with a friend. You keep open some internal buffer until you have grasped the meaning of what they are currently saying.
Anyway, probably call this thing an auto-active-buffer. Though don't know of a neat way to implement it.

6/8/2014 update: OK. Another active-buffer example:

sa: load internet-acronyms.sw
sa: active-buffer[7,0] read |text: WTF is going on OMg thx RTFM!>
2.593|phrase: What The Fuck> + 4.826|phrase: Oh My God> + 4.043|phrase: Thanks> + 2.593|phrase: Read the Fine Manual> + 2.593|phrase: Read the Fucking Manual>

I think that is pretty cool. And the results will only get better with more knowledge in BKO form.
Heh. Occurred to me we can go back the other way by using inverses.

sa: load internet-acronyms.sw
sa: create inverse
sa: read |text: WTF is going on OMg thx RTFM!>
|word: wtf> + |word: is> + |word: going> + |word: on> + |word: omg> + |word: thx> + |word: rtfm>

sa: active-buffer[7,0] read |text: WTF is going on OMg thx RTFM!>
2.593|phrase: What The Fuck> + 4.826|phrase: Oh My God> + 4.043|phrase: Thanks> + 2.593|phrase: Read the Fine Manual> + 2.593|phrase: Read the Fucking Manual>

sa: active-buffer[7,0,inverse-] active-buffer[7,0] read |text: WTF is going on OMg thx RTFM!>
1.302|word: wtf> + 3.221|word: omg> + 3.274|word: thx> + 4.044|word: rtfm>

sa: active-buffer[7,0] active-buffer[7,0,inverse-] active-buffer[7,0] read |text: WTF is going on OMg thx RTFM!>
1.333|phrase: What The Fuck> + 2.509|phrase: Oh My God> + 2.264|phrase: Thanks> + 1.525|phrase: Read the Fine Manual> + 1.525|phrase: Read the Fucking Manual>

So back and forth. Word's to phrase. Phrase to words. Words to phrase. Kinda cool!
BTW, currently the acronyms are of form:

|phrase: Be Right Back> => |word: brb>
|phrase: By The Way> => |word: btw>

Maybe they should use acronym as the operator-label instead of the empty operator-label "":

acronym |phrase: Be Right Back> => |word: brb>
acronym |phrase: By The Way> => |word: btw> 

sa: create inverse
sa: active-buffer[7,0,acronym] read |text: BTW, brb!!>
1.500|phrase: By The Way> + 1.500|phrase: Be Right Back>

sa: active-buffer[7,0,inverse-acronym] active-buffer[7,0,acronym] read |text: BTW, brb!!>
1.167|word: btw> + 1.167|word: brb>

28/3/2014: OK. I decided to finally give a proof of concept pattern recognition example.
In this case mapping pixels to letters. Here is the data we need: H-I-pat-rec.sw

|context> => |context: H I pat rec>

#   #
#   #   
#   #
#####
#   #
#   #
#   #

pixels |letter: H> => |pixel: 1: 1> + |pixel: 1: 5>
pixels |letter: H> +=> |pixel: 2: 1> + |pixel: 2: 5>
pixels |letter: H> +=> |pixel: 3: 1> + |pixel: 3: 5>
pixels |letter: H> +=> |pixel: 4: 1> + |pixel: 4: 2> + |pixel: 4: 3> + |pixel: 4: 4> + |pixel: 4: 5>
pixels |letter: H> +=> |pixel: 5: 1> + |pixel: 5: 5>
pixels |letter: H> +=> |pixel: 6: 1> + |pixel: 6: 5>
pixels |letter: H> +=> |pixel: 7: 1> + |pixel: 7: 5>

    #
#   #   
#   #
### #
#    
#   #
#   #

pixels |noisy: H> => |pixel: 1: 5>
pixels |noisy: H> +=> |pixel: 2: 1> + |pixel: 2: 5>
pixels |noisy: H> +=> |pixel: 3: 1> + |pixel: 3: 5>
pixels |noisy: H> +=> |pixel: 4: 1> + |pixel: 4: 2> + |pixel: 4: 3> + |pixel: 4: 5>
pixels |noisy: H> +=> |pixel: 5: 1>
pixels |noisy: H> +=> |pixel: 6: 1> + |pixel: 6: 5>
pixels |noisy: H> +=> |pixel: 7: 1> + |pixel: 7: 5>

#   #
#      
# ###
#####
##  #
#   #
### #

pixels |noisy: H2> => |pixel: 1: 1> + |pixel: 1: 5>
pixels |noisy: H2> +=> |pixel: 2: 1>
pixels |noisy: H2> +=> |pixel: 3: 1> + |pixel: 3: 3> + |pixel: 3: 4> + |pixel: 3: 5>
pixels |noisy: H2> +=> |pixel: 4: 1> + |pixel: 4: 2> + |pixel: 4: 3> + |pixel: 4: 4> + |pixel: 4: 5>
pixels |noisy: H2> +=> |pixel: 5: 1> + |pixel: 5: 2> + |pixel: 5: 5>
pixels |noisy: H2> +=> |pixel: 6: 1> + |pixel: 6: 5>
pixels |noisy: H2> +=> |pixel: 7: 1> + |pixel: 7: 2> + |pixel: 7: 3> + |pixel: 7: 5>



#####
  #
  #
  #
  #
  #
#####

pixels |letter: I> => |pixel: 1: 1> + |pixel: 1: 2> + |pixel: 1: 3> + |pixel: 1: 4> + |pixel: 1: 5>
pixels |letter: I> +=> |pixel: 2: 3>
pixels |letter: I> +=> |pixel: 3: 3>
pixels |letter: I> +=> |pixel: 4: 3>
pixels |letter: I> +=> |pixel: 5: 3>
pixels |letter: I> +=> |pixel: 6: 3>
pixels |letter: I> +=> |pixel: 7: 1> + |pixel: 7: 2> + |pixel: 7: 3> + |pixel: 7: 4> + |pixel: 7: 5>



####
  #
  
  
  #
  #
# ###

pixels |noisy: I> => |pixel: 1: 1> + |pixel: 1: 2> + |pixel: 1: 3> + |pixel: 1: 4>
pixels |noisy: I> +=> |pixel: 2: 3>
pixels |noisy: I> +=> |>
pixels |noisy: I> +=> |>
pixels |noisy: I> +=> |pixel: 5: 3>
pixels |noisy: I> +=> |pixel: 6: 3>
pixels |noisy: I> +=> |pixel: 7: 1> + |pixel: 7: 3> + |pixel: 7: 4> + |pixel: 7: 5>


##  #
 ###
  #
  #
  ###
####
#####

pixels |noisy: I2> => |pixel: 1: 1> + |pixel: 1: 2> + |pixel: 1: 5>
pixels |noisy: I2> +=> |pixel: 2: 2> + |pixel: 2: 3> + |pixel: 2: 4>
pixels |noisy: I2> +=> |pixel: 3: 3>
pixels |noisy: I2> +=> |pixel: 4: 3>
pixels |noisy: I2> +=> |pixel: 5: 3> + |pixel: 5: 4> + |pixel: 5: 5>
pixels |noisy: I2> +=> |pixel: 6: 1> + |pixel: 6: 2> + |pixel: 6: 3> + |pixel: 6: 4> 
pixels |noisy: I2> +=> |pixel: 7: 1> + |pixel: 7: 2> + |pixel: 7: 3> + |pixel: 7: 4> + |pixel: 7: 5>

OK. Then we drop into the console:

$ ./the_semantic_db_console.py
Welcome!

sa: load H-I-pat-rec.sw
loading sw file: H-I-pat-rec.sw

sa: simm |*> #=> 100 similar[pixels] |_self>    -- use this to save typing.

sa: simm |noisy: H>
82.353|letter: H> + 61.905|noisy: H2> + 26.667|letter: I> + 25.000|noisy: I2> + 14.286|noisy: I>

sa: simm |noisy: H2>
76.190|letter: H> + 61.905|noisy: H> + 47.619|noisy: I2> + 38.095|letter: I> + 19.048|noisy: I>

sa: simm |letter: H>
82.353|noisy: H> + 76.190|noisy: H2> + 35.000|noisy: I2> + 29.412|letter: I> + 17.647|noisy: I>


sa: simm |noisy: I>
73.333|letter: I> + 45.000|noisy: I2> + 19.048|noisy: H2> + 17.647|letter: H> + 14.286|noisy: H>

sa: simm |noisy: I2>
65.000|letter: I> + 47.619|noisy: H2> + 45.000|noisy: I> + 35.000|letter: H> + 25.000|noisy: H>

sa: simm |letter: I>
73.333|noisy: I> + 65.000|noisy: I2> + 38.095|noisy: H2> + 29.412|letter: H> + 26.667|noisy: H>

Here is the code to print out the lettters based on the defined pixels.

OK. I now have code that can convert the rules to a string image, and the inverse, code to convert a string image to rules.

def pixels_to_string(context,one):
  data = one.apply_op(context,"pixels")
  I = 5
  J = 7

  string = ""
  for j in range(1,J+1):
    for i in range(1,I+1):
      elt = ket("pixel: " + str(j) + ": " + str(i))
      coeff = data.find_value(elt)
      c = '#'
      if coeff == 0:
        c = ' '
      string += c
    string += "\n"
  return string.rstrip('\n')

def pixel_ket(i,j):
  return ket("pixel: " + str(j) + ": " + str(i))

def create_pixel_rules(label,image):
  pre = "pixels |" + label + "> +=>"

  i = 0
  j = 0

  for line in image.split('\n'):
    result = superposition()
    j += 1
    for c in line:
      i += 1
      if c != ' ':
        result += pixel_ket(i,j)
    print(pre,result)
    i = 0

Here is an example going back and forth:

C = context_list("pattern recognition play")                -- define a context_list.
load_sw(C,"H-I-pat-rec.sw")                                 -- load up the data from the relevant .sw file.

image = pixels_to_string(C,ket("letter: H"))                -- convert the rules for |letter: H> to a string representation.
print(image)
#   #
#   #
#   #
#####
#   #
#   #
#   #

create_pixel_rules("letter: H",image)                       -- convert the string "image" back to a list of rules.
pixels |letter: H> +=> |pixel: 1: 1> + |pixel: 1: 5>
pixels |letter: H> +=> |pixel: 2: 1> + |pixel: 2: 5>
pixels |letter: H> +=> |pixel: 3: 1> + |pixel: 3: 5>
pixels |letter: H> +=> |pixel: 4: 1> + |pixel: 4: 2> + |pixel: 4: 3> + |pixel: 4: 4> + |pixel: 4: 5>
pixels |letter: H> +=> |pixel: 5: 1> + |pixel: 5: 5>
pixels |letter: H> +=> |pixel: 6: 1> + |pixel: 6: 5>
pixels |letter: H> +=> |pixel: 7: 1> + |pixel: 7: 5>

This means I can now create pixel rule sets for images just by doing ascii art, without the hard work of working out the rules manually.
And has laid the groundwork for processing real images.

Part of my thinking is there is a three stage step to maths ideas:
Step 1 is collect all the examples of the thing you are interested in.
Step 2 is find the intersection of those examples. Find what they have in common and extract its mathematical essence.
Step 3: You have your mathematical idea solidly worked out. Now apply it to all those examples in step 1 and more.

30/7/2014: Another driving idea behind this project is one of notational efficiency. I'm of the opinion that the correct notation can make a problem much easier to handle. There are a whole bunch of examples from physics that back this up. Too lazy to mention them, but one example was Maxwell's equations. I've been told that in his original work he wrote it out all long hand, without the simplification of the notation of vector calculus. Ouch! The other idea is if our new notation makes something hard easier, it also by consequence, makes something essentially impossible/tedious possible.

4/8/2014: Yet another driving idea is generality. Don't code for specific solutions, code for as general as you can manage.

30/3/2014: Made some progress for defining the grammar for this thing. Heh, more work than I expected!

Tweaked create_pixel_rules() to find and print out the dimensions of the object.
And print_pixels() to extract them, instead of hard-wired in 5*7.

#######
#  #  #
#  #  #
### ###
#  #  #
#  #  #
#######
pixels |squares> +=> |pixel: 1: 1> + |pixel: 1: 2> + |pixel: 1: 3> + |pixel: 1: 4> + |pixel: 1: 5> + |pixel: 1: 6> + |pixel: 1: 7>
pixels |squares> +=> |pixel: 2: 1> + |pixel: 2: 4> + |pixel: 2: 7>
pixels |squares> +=> |pixel: 3: 1> + |pixel: 3: 4> + |pixel: 3: 7>
pixels |squares> +=> |pixel: 4: 1> + |pixel: 4: 2> + |pixel: 4: 3> + |pixel: 4: 5> + |pixel: 4: 6> + |pixel: 4: 7>
pixels |squares> +=> |pixel: 5: 1> + |pixel: 5: 4> + |pixel: 5: 7>
pixels |squares> +=> |pixel: 6: 1> + |pixel: 6: 4> + |pixel: 6: 7>
pixels |squares> +=> |pixel: 7: 1> + |pixel: 7: 2> + |pixel: 7: 3> + |pixel: 7: 4> + |pixel: 7: 5> + |pixel: 7: 6> + |pixel: 7: 7>
dim-1 |squares> => |dimension: 7>
dim-2 |squares> => |dimension: 7>

Now, if it is not obvious, BKO is actually a representation for sparse matrices.
(if a element has a coeff of 0 we normally don't include it in the superposition)
(this also means we could call our "superpositions" vectors)
eg:

y = M x
[ y1 ]   [ 0 1 1 0 ] [ x1 ]
[ y2 ] = [ 4 0 2 3 ] [ x2 ]
[ y3 ]   [ 2 1 4 4 ] [ x3 ]
                     [ x4 ]

In BKO this is:

M |x1> => 0.0|y1> + 4.0|y2> + 2.0|y3>
M |x2> => |y1> + 0.0|y2> + |y3>
M |x3> => |y1> + 2.0|y2> + 4.0|y3>
M |x4> => 0.0|y1> + 3.0|y2> + 4.0|y3>

A couple of examples.
First, say x = (1,1,1,1), then we have:

sa:  M (|x1> + |x2> + |x3> + |x4>)
2.000|y1> + 9.000|y2> + 11.000|y3>

ie, y = (2,9,11)
Next, say x = (9,3,0,4), then we have:

sa: M (9|x1> + 3|x2> + 0|x3> + 4|x4>)
3.000|y1> + 48.000|y2> + 37.000|y3>

ie, y = (3,48,37)

OK. Another matrix <=> BKO example:

y = M1 x
[ y1 ] = [ 0 7 1 1 6 4 1 ] [ x1 ]
[ y2 ]   [ 3 6 4 0 4 8 2 ] [ x2 ]
                           [ x3 ]
                           [ x4 ]
                           [ x5 ]
                           [ x6 ]
                           [ x7 ]

z = M2 y
[ z1 ]   [ 6 0 ] [ y1 ]
[ z2 ]   [ 2 3 ] [ y2 ]
[ z3 ] = [ 7 4 ]
[ z4 ]   [ 9 0 ]
[ z5 ]   [ 5 1 ]

In BKO this is:

M1 |x1> => 3|y2>                        -- NB: we drop/ignore terms that have coeff == 0.
M1 |x2> => 7|y1> + 6|y2>
M1 |x3> => |y1> + 4|y2>
M1 |x4> => |y1> 
M1 |x5> => 6|y1> + 4|y2>
M1 |x6> => 4|y1> + 8|y2>
M1 |x7> => |y1> + 2|y2>

M2 |y1> => 6|z1> + 2|z2> + 7|z3> + 9|z4> + 5|z5>
M2 |y2> => 3|z2> + 4|z3> + |z5>

Now, let's play with the BKO.
First, say x = (1,1,1,1,1,1,1), then we have:

sa: M1 (|x1> + |x2> + |x3> + |x4> + |x5> + |x6> + |x7>)
27.000|y2> + 20.000|y1>                 -- NB: the order of |y1> and |y2> are reversed. This is irrelevant.

ie, y = (20,27)
Next, say x = (8,0,9,3,1,6,1), then we have:

sa: M1 (8|x1> + 9|x3> + 3|x4> + |x5> + 6|x6> + |x7>)
43.000|y1> + 114.000|y2>

ie, y = (43,114)
Next, say y = (1,1), then we have:

sa: M2 (|y1> + |y2>)
6.000|z1> + 5.000|z2> + 11.000|z3> + 9.000|z4> + 6.000|z5>

ie, z = (6,5,11,9,6)
Next, say y = (43,114), which we obtained from y = M1 x, and x = (8,0,9,3,1,6,1), then we have:

sa: M2 (43|y1> + 114|y2>)
258.000|z1> + 428.000|z2> + 757.000|z3> + 387.000|z4> + 329.000|z5>

ie, z = (258,428,757,387,329)
Now if we consider: z = M2 M1 x, we get the same answer if we feed in x = (8,0,9,3,1,6,1).

sa: M2 M1 (8|x1> + 9|x3> + 3|x4> + |x5> + 6|x6> + |x7>)
258.000|z1> + 428.000|z2> + 757.000|z3> + 387.000|z4> + 329.000|z5>

Finally, if we want to find M = M2 M1, so that z = M x, then we can do that easily enough too:

sa: M2 M1 |x1>
9.000|z2> + 12.000|z3> + 3.000|z5>

sa: M2 M1 |x2>
42.000|z1> + 32.000|z2> + 73.000|z3> + 63.000|z4> + 41.000|z5>

sa: M2 M1 |x3>
6.000|z1> + 14.000|z2> + 23.000|z3> + 9.000|z4> + 9.000|z5>

sa: M2 M1 |x4>
6.000|z1> + 2.000|z2> + 7.000|z3> + 9.000|z4> + 5.000|z5>

sa: M2 M1 |x5>
36.000|z1> + 24.000|z2> + 58.000|z3> + 54.000|z4> + 34.000|z5>

sa: M2 M1 |x6>
24.000|z1> + 32.000|z2> + 60.000|z3> + 36.000|z4> + 28.000|z5>

sa: M2 M1 |x7>
6.000|z1> + 8.000|z2> + 15.000|z3> + 9.000|z4> + 7.000|z5>

In standard matrix representation, this is:

z = M x
[ z1 ]   [ 0  42 6  6 36 24 6  ] [ x1 ]
[ z2 ]   [ 9  32 14 2 24 32 8  ] [ x2 ]
[ z3 ] = [ 12 73 23 7 58 60 15 ] [ x3 ]
[ z4 ]   [ 0  63 9  9 54 36 9  ] [ x4 ]
[ z5 ]   [ 3  41 9  5 34 28 7  ] [ x5 ]
                                 [ x6 ]
                                 [ x7 ]

Interestingly enough, we can use this same construct, ie "op4 op3 op2 op1 |x_i>" to find the "effective" matrix representation for operators more interesting than just literal operators.
In practice they are going to be very large, very sparse, and loose the dynamic nature of the operators.
Still, I'm guessing there will be use cases for this.
And cf QM, we can convert a single dynamic operator into a matrix using:

M_mn = <m|some-op|n>

Of course, this breaks if some-op is non-linear.

Now, some more on the claim that BKO can act as a representation for sparse matrices:
(where this matrix has label "O")

[ a1  ]   [ 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 ] [ a1  ]
[ a2  ]   [ 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] [ a2  ]
[ a3  ]   [ 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] [ a3  ]
[ a4  ]   [ 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] [ a4  ]
[ a5  ]   [ 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 ] [ a5  ]
[ a6  ]   [ 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 ] [ a6  ]
[ a7  ]   [ 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 ] [ a7  ]
[ a8  ]   [ 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 ] [ a8  ]
[ a9  ] = [ 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 ] [ a9  ]
[ a10 ]   [ 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 ] [ a10 ]
[ b1  ]   [ 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 ] [ b1  ]
[ b2  ]   [ 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 ] [ b2  ]
[ b3  ]   [ 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 ] [ b3  ]
[ b4  ]   [ 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 ] [ b4  ]
[ b5  ]   [ 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 ] [ b5  ]
[ b6  ]   [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 ] [ b6  ]
[ b7  ]   [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 ] [ b7  ]

In BKO this is more compactly written as (this BTW is an example from higher up):

O |a1> => |a2>
O |a2> => |a3>
O |a3> => |a4>
O |a4> => |a5>
O |a5> => |a6>
O |a6> => |a7>
O |a7> => |a8>
O |a8> => |a9>
O |a9> => |a10>
O |a10> => |a1> + |b1>

O |b1> => |b2>
O |b2> => |b3>                            
O |b3> => |b4>
O |b4> => |b5>
O |b5> => |b6>
O |b6> => |b7>
O |b7> => |b1>

BTW, "currency conservation" corresponds to sum of columns == 1 (which we don't have here due to the O |a10> => |a1> + |b1> rule).
If we do have currency conservation for operator "some-op" then we have this property:

count-sum superposition == count-sum some-op superposition

eg:

sa: count-sum (5|a1> + 9|a2> + 4|a3> + 2|a4> + |a5> + 9|a6> + 2|a7> + 7|a8> + 5|a9> + 2|a10> + 4|b1> + 5|b2> + 7|b3> + 8|b6> + 4|b7>)
|number: 74.0>

sa: -- without currency conservation, apply O matrix once:
sa: count-sum O (5|a1> + 9|a2> + 4|a3> + 2|a4> + |a5> + 9|a6> + 2|a7> + 7|a8> + 5|a9> + 2|a10> + 4|b1> + 5|b2> + 7|b3> + 8|b6> + 4|b7>)
|number: 76.0>

sa: -- apply O matrix 20 times:
sa: count-sum O^20 (5|a1> + 9|a2> + 4|a3> + 2|a4> + |a5> + 9|a6> + 2|a7> + 7|a8> + 5|a9> + 2|a10> + 4|b1> + 5|b2> + 7|b3> + 8|b6> + 4|b7>)
|number: 166.0>

sa: O |a10> => 0.3|a1> + 0.7|b1>       -- restore currency conservation, and apply O once:
sa: count-sum O (5|a1> + 9|a2> + 4|a3> + 2|a4> + |a5> + 9|a6> + 2|a7> + 7|a8> + 5|a9> + 2|a10> + 4|b1> + 5|b2> + 7|b3> + 8|b6> + 4|b7>)
|number: 74.0>

sa: -- apply O matrix 20 times:
sa: count-sum O^20 (5|a1> + 9|a2> + 4|a3> + 2|a4> + |a5> + 9|a6> + 2|a7> + 7|a8> + 5|a9> + 2|a10> + 4|b1> + 5|b2> + 7|b3> + 8|b6> + 4|b7>)
|number: 74.00000000000001>            -- and we still have the original amount of currency after 20 rounds of the O matrix.

A quick and fun play with stored general rules and algebra:

sa: dump
----------------------------------------
|context> => |context: algebra play>

supported-ops |*> => |op: foo>
foo |*> #=> algebra(""|_self>,|^>,|3>)

supported-ops |x> => |op: >
 |x> => |a>

supported-ops |y> => |op: >
 |y> => |a> + |b>

supported-ops |z> => |op: >
 |z> => |a> + |b> + |c>
----------------------------------------

sa: foo |x>
|a*a*a>

sa: foo |y>
|a*a*a> + 3.000|a*a*b> + 3.000|a*b*b> + |b*b*b>

sa: foo |z>
|a*a*a> + 3.000|a*a*b> + 3.000|a*a*c> + 3.000|a*b*b> + 6.000|a*b*c> + 3.000|a*c*c> + |b*b*b> + 3.000|b*b*c> + 3.000|b*c*c> + |c*c*c>

26/4/2014 update: Now, what if you want functions of more than one variable.
How does that fit in with the foo |*> #=> thing where you can only change |x>?
Well, you have to do that indirectly.
eg:

----------------------------------------
 |x> => |a>
power |x> => |4>

 |y> => |a> + |b>
power |y> => |5>

foo |*> #=> algebra(""|_self>,|^>,|3>)
bah |*> #=> algebra(""|_self>,|^>,power|_self>)
----------------------------------------

sa: foo |x>
|a*a*a>
sa: bah |x>
|a*a*a*a>

sa: foo |y>
|a*a*a> + 3.000|a*a*b> + 3.000|a*b*b> + |b*b*b>
sa: bah |y>
|a*a*a*a*a> + 5.000|a*a*a*a*b> + 10.000|a*a*a*b*b> + 10.000|a*a*b*b*b> + 5.000|a*b*b*b*b> + |b*b*b*b*b>

These being equivalent to:

foo(t) = t^3
bah(t,power) = t^power

BTW, a side effect of the sparse-matrix idea is that rows of all zero's can be arbitrarily dropped or included.
(this is also because individual elements have their own labels)
Say we have this matrix, and call it M. So y = M x:

[ y1 ] = [ 7 3 2 9 ] [ x1 ]
[ y2 ] = [ 0 0 0 0 ] [ x2 ]
[ y3 ] = [ 5 2 0 3 ] [ x3 ]
[ y4 ] = [ 0 0 0 0 ] [ x4 ]

In BKO terms is identical to:

[ y1 ] = [ 7 3 2 9 ] [ x1 ]
[ y3 ] = [ 5 2 0 3 ] [ x2 ]
                     [ x3 ]
                     [ x4 ]

and:

[ y1 ] = [ 7 3 2 9 ] [ x1 ]
[ y2 ] = [ 0 0 0 0 ] [ x2 ]
[ y3 ] = [ 5 2 0 3 ] [ x3 ]
[ y4 ] = [ 0 0 0 0 ] [ x4 ]
[ y5 ] = [ 0 0 0 0 ]
[ y6 ] = [ 0 0 0 0 ]
[ y7 ] = [ 0 0 0 0 ]

with this set of rules:

M |x1> => 7|y1> + 5|y3>
M |x2> => 3|y1> + 2|y3>
M |x3> => 2|y1>
M |x4> => 9|y1> + 3|y3>

10/4/2014: We now have enough pieces to implement the idea of "she is out of my league", and "she is in my league"

sa: features |my perfect woman> => |beautiful> + |smart> + |skinny> + |educated> + |loving> + |sexy>
sa: features |Mary> => |loving> + |skinny>
sa: features |Liz> => |smart> + |educated> + |loving>
sa: features |Jane> => |skinny> + |sexy>
sa: features |Mia> => |smart> + |skinny> + |educated> + |loving>
sa: features |Emma> => |athletic> + |skinny> + |sexy> + |beautiful> + |religious>
sa: features |Donna> => |beautiful> + |smart> + |skinny> + |educated> + |sexy>
sa: features |the goddess> => |beautiful> + |smart> + |skinny> + |educated> + |loving> + |sexy>
sa: fsimm |*> #=> 100 similar[features] |_self>         -- use this to save typing

sa: fsimm |my perfect woman>
100.000|the goddess> + 83.333|Donna> + 66.667|Mia> + 50.000|Liz> + 50.000|Emma> + 33.333|Mary> + 33.333|Jane>

sa: -- she is out of my league:
sa: drop in-range[80,100] fsimm |my perfect woman>
100.000|the goddess> + 83.333|Donna>

sa: -- she is in my league:
sa: drop in-range[50,80] fsimm |my perfect woman>
66.667|Mia> + 50.000|Liz> + 50.000|Emma>

sa: -- I'm not all that interested in her:
sa: drop-above[49] fsimm |my perfect woman>
33.333|Mary> + 33.333|Jane>

There are a bunch of things working underneath to make this all work!
General rules. ie, where the ket-label has a * in it. In this case fsimm |*> ...
Stored rules. ie, #=>
The |_self> ket.
The similarity metric, that powers similar[op] |ket>
Sigmoids. In this case: in-range[a,b]
The built in functions drop and drop-above.

Heh. Seems the general rules can handle recursion just fine, which I was not exactly expecting.
Here is an example with the Fibonacci sequence.
First we need some small Fib numbers to compare against:

F0	0
F1	1
F2	1
F3	2
F4	3
F5	5
F6	8
F7	13
F8	21
F9	34
F10	55
F11	89
F12	144
F13	233
F14	377
F15	610
F16	987
F17	1597
F18	2584
F19	4181
F20	6765
F21	10946
F22	17711
F23	28657
F24	46368
F25	75025
F26	121393
F27	196418
F28	317811
F29	514229
F30	832040
F31	1346269
F32	2178309
F33	3524578
F34	5702887
F35	9227465
F36	14930352
F37	24157817
F38	39088169
F39	63245986
F40	102334155
F41	165580141
F42	267914296
F43	433494437
F44	701408733
F45	1134903170

Here is Fibonacci in BKO:

|context> => |context: Fibonacci method 1>
fib |0> => |0>
fib |1> => |1>

n-1 |*> #=> arithmetic(|_self>,|->,|1>)
n-2 |*> #=> arithmetic(|_self>,|->,|2>)
fib |*> #=> fib n-1 |_self> + fib n-2 |_self>

|context> => |context: Fibonacci method 2>
fib |0> => |0>
fib |1> => |1>

n-1 |*> #=> arithmetic(|_self>,|->,|1>)
n-2 |*> #=> arithmetic(|_self>,|->,|2>)
fib |*> #=> to-number ( fib n-1 |_self> + fib n-2 |_self> )

|context> => |context: Fibonacci method 3>
fib |0> => |0>
fib |1> => |1>

n-1 |*> #=> arithmetic(|_self>,|->,|1>)
n-2 |*> #=> arithmetic(|_self>,|->,|2>)
fib |*> #=> arithmetic( fib n-1 |_self>, |+>, fib n-2 |_self>)

Now, initially this is really quite slow. Pages and pages of debugging info fills my screen even for fib |20>.
However, if we store 2 entries in a row (ie, fib(k) and fib(k+1) for some k), using specific rules, we get a massive speed up.
(I believe this is called "memoization")
eg:

sa: fib |34> => |5702887>
sa: fib |35> => |9227465>
sa: fib |45>
|1134903170>

And of course, we can now easily find the Golden Ratio:

sa: fib-ratio |*> #=> arithmetic( fib |_self> , |/>, fib n-1 |_self> )
sa: fib-ratio |45>
|1.618033988749895>

OK. Now here is a fun thing. We can store literal rules that over-ride the general rule. Again, words fail me. Here is an example:

sa: fib |13> => fib |13>                            -- learn the literal/specific rule using the general/function rule (on the right hand side).
sa: fib |14> => fib |14>
sa: dump
----------------------------------------
|context> => |context: Fibonacci method 3>

supported-ops |0> => |op: fib>
fib |0> => |0>

supported-ops |1> => |op: fib>
fib |1> => |1>

supported-ops |*> => |op: n-1> + |op: n-2> + |op: fib> + |op: fib-ratio>
n-1 |*> #=> arithmetic(|_self>,|->,|1>)
n-2 |*> #=> arithmetic(|_self>,|->,|2>)
fib |*> #=> arithmetic( fib n-1 |_self>, |+>, fib n-2 |_self>)
fib-ratio |*> #=> arithmetic( fib |_self> , |/>, fib n-1 |_self> )

supported-ops |13> => |op: fib>
fib |13> => |233>                                   -- this is the interesting bit. On the right, "fib |13>" has been replaced with its value, |233>.

supported-ops |14> => |op: fib>
fib |14> => |377>                                   -- here too.
----------------------------------------

Now whenever the code wants to know "fib |13>" or "fib |14>" it uses the specific rule, rather than the fib |*> general rule.
This also appears to happen in the brain. Say you ask a child "what is 3 + 5?".
Well, they do a mental calculation equivalent to arithmetic(|3>,|+>,|5>).
But eventually they learn the specific rule: 3 + 5 == 8. They no longer have to do arithmetic (say using their fingers), they can just mentally look up the answer.

Next on the recursion list is factorial:

fact |0> => |1>
n-1 |*> #=> arithmetic(|_self>,|->,|1>)
fact |*> #=> arithmetic( |_self>, |*>, fact n-1 |_self>)

Heh. Just realised we can make use of lineraity of operators.
eg:

sa: fib (|33> + |37> + |40>)
|3524578> + |24157817> + |102334155>

sa: fact (|3> + |4> + |5> + |6>)
|6> + |24> + |120> + |720>

And if you want your numbers to include categories, we can do that easily enough too.
BTW, the code to handle this is built into the arithmetic() function.
NB: if you feed in two numbers with different categories then the arithmetic function returns the empty ket. This is to prevent non-sensical calculations where you accidentally mix types.

sa: fact |number: 0> => |number: 1>
sa: n-1 |number: *> #=> arithmetic(|_self>,|->,|number: 1>)      -- NB: the |number: 1> instead of just |1>.
sa: fact |number: *> #=> arithmetic( |_self>, |*>, fact n-1 |_self>)

sa: fact |number: 6>
|number: 720>

30/4/2014 update: Let's give an example or two of average and weighted average.

----------------------------------------
|context> => |context: average>

ave |*> #=> arithmetic(count-sum "" |_self>,|/>,count "" |_self>)
apply-weights |*> #=> mult(""|_self>, weights|_self>)
weighted-ave |*> #=> arithmetic(count-sum apply-weights |_self>,|/>,count-sum weights |_self>)
harmonic-mean |*> #=> arithmetic(count "" |_self>,|/>,count-sum invert "" |_self>)

 |x> => |a> + 2.000|b> + 3.000|c> + 4.000|d>
weights |x> => 0.100|a> + 0.100|b> + 0.700|c> + 0.100|d>

 |y> => |a> + 2.000|b> + 5.000|c> + 7.000|d>
weights |y> => 2.000|a> + 14.000|b> + 8.000|c> + 32.000|d>

 |z> => 60.000|a> + 40.000|b>
----------------------------------------
sa: ave |x>
|number: 2.5>

sa: ave |y>
|number: 3.75>

sa: weighted-ave |x>
|number: 2.8>

sa: weighted-ave |y>
|number: 5.25>

-- then making use of linearity we can do more than one at a time:
sa: ave (|x> + |y> + |z>)
|number: 2.5> + |number: 3.75> + |number: 50.0>

sa: weighted-ave (|x> + |y>)
|number: 2.8> + |number: 5.25>

sa: harmonic-mean (|x> + |y> + |z>)
|number: 1.9200000000000004> + |number: 2.170542635658915> + |number: 47.99999999999999>

13/4/2014: Motivated by recursion working, I figured maybe we could use an if/else statement.
A couple of lines of code and it is all working.

# bko_if(|True>,|a>,|b>)  -- returns |a>
# bko_if(|False>,|c>,|d>) -- returns |d>
def bko_if(condition,one,two):
  if condition.the_label() == "True":
    return one
  else:
    return two

sa: if (|True>,|x>,|y>)
|x>

sa: if(|False>,|x>,|y>)
|y>

sa: foo |fish> => |True>
sa: bah |fish> => |False>
sa: if (foo |fish> , |true fish> , |false fish>)               -- foo|fish> is an indirect condition.
|true fish>

sa: if (bah |fish> , |true fish> , |false fish>)               -- so is bah|fish>
|false fish>

Just discovered there is kind of a bug here!
The compound superpositions are calculated before being handed to if. Probably have to build if in at a deeper level to fix this.
eg (with partly edited debugging info):

sa: if (foo|fish>, shout |true fish>, shout |false fish>)
TRUE FISH                                                      -- note the code shouts both cases,
FALSE FISH                                                     -- indpendent of which branch the if statement chooses.
len:  3
sp:  |True>
sp:  |TRUE FISH>
sp:  |FALSE FISH>
op in whitelist 3
py: bko_if(pieces[0],pieces[1],pieces[2])
|TRUE FISH>

Heh. Found a neat solution:

-- method 1:
sa: shout if (foo |fish>, |true fish>, |false fish>)
TRUE FISH
|TRUE FISH>

-- method 2:
sa: |true fish> #=> shout |true fish>
sa: |false fish> #=> shout |false fish>
sa: "" if(bah|fish>, |true fish>, |false fish>)
FALSE FISH
|FALSE FISH>

-- method 2:
activate |true fish> #=> shout |true fish branch>
activate |false fish> #=> shout |false fish branch>
sa: activate if(bah|fish>, |true fish>, |false fish>)
FALSE FISH BRANCH
|FALSE FISH BRANCH>

So, let's try with words...
Consider if(condition,|a>,|b>)
So the idea is you apply the op-sequence after the if statement has been evaluated, not before.
Wrong:

if(condition,op3 op2 op1|a>,op3 op2 op1|b>)

Correct:

op3 op2 op1 if(condition,|a>,|b>)

That is method 1.

Method 2 is if you want different op-sequences applied to |a> and |b>. So you define an indirect operator first.
Wrong:

if(condition,op3 op2 op1|a>,op7 op6|b>)

Correct:

temp-op |a> => op3 op2 op1 |a>
temp-op |b> => op7 op6 |b>
temp-op if(condition,|a>,|b>)

But of course, sometimes you do want the superposition calculated before applying the if statement.
eg, a mixed case such as:

temp-op |x> => op2 op1 |x>
temp-op |y> => op6 op5 op4 |y>
temp-op if(condition,bah foo|x>, bah|y>)

14/4/2014: tweaked valid_op() so that now ! is a valid op-label char.
We had this:

def valid_op(op):
  if not op[0].isalpha():
    return False
  return all(c in ascii_letters + '0123456789-' for c in op)

and now we have this:

def valid_op(op):
  if not op[0].isalpha() and not op[0] == '!':
    return False
  return all(c in ascii_letters + '0123456789-!' for c in op)

So now we can do:

sa: ! |False> => |True>
sa: !|True> => |False>
sa: shout! |*> #=> shout |_self>

sa: ! |False>
|True>

sa: ! ! |False>    -- NB: this is ! !, ie separated by a space (this is an op-sequence after-all) and not !! unless you define that yourself.
|False>

sa: shout! |fish>
FISH
|FISH>

BTW, we had to use ! and not "NOT" since NOT is already used as a sigmoid.

Re: "... and not !! unless you define that yourself"
Is actually trivial to do:

sa: !!|*> #=> ! !|_self>
sa: !!|True>
|True>

sa: !!|False>
|False>

11/5/2014 update: And there is an interesting twist here. |True> and |False> represent Boolean. But there are other alternatives.
eg:

!|one> => |two>
!|two> => |three>
!|three> => |four>
!|four> => |five>
!|five> => |six>
!|six> => |one>

And so we have:

sa: ! |one>
|two>

sa: ! ! |one>
|three>

sa: ! ! ! |one>
|four>

sa: ! ! ! ! |one>
|five>

sa: ! ! ! ! ! |one>
|six>

sa: ! ! ! ! ! ! |one>
|one>

sa: ! ! ! ! ! ! ! |one>
|two>

Added the invert(x) sigmoid:

def invert(x):
  if x == 0:                         -- the idea of 1/0 = 0 has some merit.
    return 0                         -- eg, in the model 0 coeff means ignore. Seems appropriate that 1/ignore = ignore.
  else:
    return 1/x

eg:

sa: x = 3|x> + 0.5|y> + 0|z>
sa: id
3.000|x> + 0.500|y> + 0.000|z>

sa: invert
0.333|x> + 2.000|y> + 0.000|z>

sa: invert invert
3.000|x> + 0.500|y> + 0.000|z>

20/4/2014: Implemented "merge-labels". It is just a couple of lines of code, but should be useful in a few places.
# merge-labels (|fish> + |soup>) returns |fishsoup>
eg (a little on mapping BKO rules to English):

sa: friends-list |*> #=> extract-value list-to-words extract-value friends |_self>
sa: hello! |*> #=> merge-labels(|Hello > + |_self> + |!>)
sa: friends |person: Eric> => |person: Fred> + |person: Sam> + |person: Harry> + |person: Mary> + |person: liz>
sa: friends |Fred> => |Jack> + |Harry> + |Ed> + |Mary> + |Rob> + |Patrick> + |Emma> + |Charlie>
sa: friends |Sam> => |Charlie> + |George> + |Emma> + |Jack> + |Rober> + |Frank> + |Julie>
sa: friends |Charlie> => |Jack> + |Emma>
sa: dump
----------------------------------------
|context> => |context: hello friends>

supported-ops |*> => |op: friends-list> + |op: hello!>
friends-list |*> #=> extract-value list-to-words extract-value friends |_self>
hello! |*> #=> merge-labels(|Hello > + |_self> + |!>)

supported-ops |person: Eric> => |op: friends>
friends |person: Eric> => |person: Fred> + |person: Sam> + |person: Harry> + |person: Mary> + |person: liz>

supported-ops |Fred> => |op: friends>
friends |Fred> => |Jack> + |Harry> + |Ed> + |Mary> + |Rob> + |Patrick> + |Emma> + |Charlie>

supported-ops |Sam> => |op: friends>
friends |Sam> => |Charlie> + |George> + |Emma> + |Jack> + |Rober> + |Frank> + |Julie>

supported-ops |Charlie> => |op: friends>
friends |Charlie> => |Jack> + |Emma>
----------------------------------------
sa: hello! |Harry>
|Hello Harry!>

sa: hello! friends-list |Charlie>
|Hello Jack and Emma!>

sa: hello! friends-list |Sam>
|Hello Charlie, George, Emma, Jack, Rober, Frank and Julie!>

sa: hello! friends-list |Fred>
|Hello Jack, Harry, Ed, Mary, Rob, Patrick, Emma and Charlie!>

sa: hello! friends-list |person: Eric>
|Hello Fred, Sam, Harry, Mary and liz!>

Now a comment on what I currently call a non-linear resonance. The idea is there is this hidden resonance, and only if you get the input exactly right (ie, the non-linear part), do you get any hint of its existence. The whole point is that the brain has a vast collection of these non-linear resonances. Only a very particular input, from the outside world say, triggers the firing of a neuron that represents a specific concept.
eg in the example below you need 99% correct, else you get nothing.

|context> => |context: non-linear resonance>

resonance |*> #=> 1000 drop-below[0.99] simm(""|_self>, ""|g>) |g>
 |g> => |a> + |b> + |c> + |d>
 |f1> => |a>
 |f2> => |a> + |b>
 |f3> => |a> + |b> + |c>
 |f4> => |a> + |b> + |c> + 0.900|d>
 |f5> => 0.950|a> + |b> + |c> + |d>
 |f6> => |a> + |b> + |c> + |d>
 |f7> => |a> + |b> + |c> + |d> + |e>
----------------------------------------
sa: resonance |f1>                     -- no sign of a resonance.
|>

sa: resonance |f2>                     -- no sign of a resonance.
|>

sa: resonance |f3>                     -- still no sign of a resonance.
|>

sa: ket-simm(""|f4>,""|g>)             -- test how close |f4> and |g> are.
0.981|simm>                            -- 98% match.

sa: resonance |f4>                     -- 98% match, and yet still no resonance.
|>

sa: ket-simm(""|f5>,""|g>)             -- test how close |f5> and |g> are.
0.991|simm>                            -- 99% match.

sa: resonance |f5>                     
990.506|g>                             -- finally, we have resonance of |g>

sa: resonance |f6>                     -- |f6> is a perfect match with |g>
1000.000|g>

sa: resonance |f7>
|>                                     -- |f7> doesn't resonate with |g>.

2/5/2014 update: There is also another related meaning of non-linear resonance.
The idea is you have some complex curve, and then if you get it exactly right (eg, usually using simm) then you get a resonance.
If you are even a little off (depending on how strict you are with your threshold filter), you get nothing.
In the above example the curve is simply |a> + |b> + |c> + |d> (for a generous definition of "curve").
But we can have examples like:

|curve> => 15|x: 1> + 4|x: 2> + 8|x: 3> + 5|x: 4> + 4|x: 5> + 16|x: 6> + 0|x: 7> + 17|x: 8> + 4|x: 9> + 17|x: 10> + 5|x: 11> + 15|x: 12> + 10|x: 13> + 19|x: 14> + 11|x: 15> + 1|x: 16> + 10|x: 17> + 13|x: 18> + 3|x: 19> + 5|x: 20> + 4|x: 21> + 5|x: 22> + 13|x: 23> + 7|x: 24> + 6|x: 25> + 12|x: 26> + 9|x: 27> + 3|x: 28> + 3|x: 29> + 8|x: 30>

In the brain this could be for example the set of sound-wave patterns that excite the "frog" neuron.
A related thought is, given invariance requirements, how many layers of processing do we need?
eg, in the "frog" case, invariant under sound volume, pitch of voice, speed of speech and accent.

27/6/2014 update: I wrote some quick code to map lists to superpositions, and superpositions back to lists.

def list_to_sp(s,list):
  result = superposition()
  result.data = [ket(s + str(k),v) for k,v in enumerate(list)]
  return result

def sp_to_list(sp):
  return [x.value for x in sp.ket_sort().data]                     # NB: the ket_sort(). Even if we shuffle the sp, we get the same list back.

For example:
15.000|x: 1> + 4.000|x: 2> + 8.000|x: 3> + 5.000|x: 4> + 4.000|x: 5> + 16.000|x: 6> + 0.000|x: 7> + 17.000|x: 8> + 4.000|x: 9>
maps to:
[15.0, 4.0, 8.0, 5.0, 4.0, 16.0, 0.0, 17.0, 4.0]

Of course, there is a brother to the non-linear resonance. There is the fuzzy resonance. ie, you can be someway off in terms of your pattern, but you still get resonance.
It is really a question of how tight you set the threshold filter (in this case implemented using drop-below and 51%, above I used 99%).
eg, a couple of versions of fuzzy resonance. The first has some measure of how close to the resonance you are, the second maxes out (at 200), even if you are a bit away from target.

fuzzy-resonance-1 |*> #=> 200 drop-below[0.51] simm(""|_self>, ""|g>) |g>
fuzzy-resonance-2 |*> #=> 200 clean drop-below[0.51] simm(""|_self>, ""|g>) |g>

sa: fuzzy-resonance-1 |f1>
|>

sa: fuzzy-resonance-1 |f2>
|>

sa: fuzzy-resonance-1 |f3>
150.000|g>

sa: fuzzy-resonance-1 |f4>
196.154|g>

sa: fuzzy-resonance-1 |f5>
198.101|g>

sa: fuzzy-resonance-1 |f6>
200.000|g>

sa: fuzzy-resonance-1 |f7>
160.000|g>



sa: fuzzy-resonance-2 |f1>
|>

sa: fuzzy-resonance-2 |f2>
|>

sa: fuzzy-resonance-2 |f3>
200.000|g>

sa: fuzzy-resonance-2 |f4>
200.000|g>

sa: fuzzy-resonance-2 |f5>
200.000|g>

sa: fuzzy-resonance-2 |f6>
200.000|g>

sa: fuzzy-resonance-2 |f7>
200.000|g>

Added a directory of current .sw example files.
Note there are some interesting files in there, but also rubbishy ones I used for testing.
At some stage I should tidy it up!

21/4/2014: OK. I haven't given the original MatSumSig model much thought in a long time. But I think it goes something like this:

[ y1 ]   [ s1[x1,t1] ] [ sum[x1,t] ] [ a1 a2 a3 a4 a5 ] [ x1 ]
[ y2 ]   [ s2[x1,t2] ]                                  [ x2 ]
[ y3 ]   [ s3[x1,t3] ]                                  [ x3 ]
[ y4 ] = [ s4[x1,t4] ]                                  [ x4 ]
[ y5 ]   [ s5[x1,t5] ]                                  [ x5 ]
[ y6 ]   [ s6[x1,t6] ]                                  
[ y7 ]   [ s7[x1,t7] ]                                  
[ y8 ]   [ s8[x1,t8] ]

This being a simplified model of a single neuron, and uses my idea of a function matrix (which I need to describe at some point).
Where:

{a1,a2,a3,a4,a5} are reals/floats and can be positive or negative.
sum[x,t] sums the input x for a time-slice of length t, then spits out the result at the end of that time slice. 
If we don't include the sum[] term, assume t = 0.
Indeed, we only need t > 0 if we want time-dependence. 
s_k[x,t_k] are sigmoids, with passed in parameter t_k.

Note that there are a lot of free parameters here, and I have no idea how the brain tweaks them!
We have {a1,a2,a3,a4,a5}, {t,t1,t2,...,t8}, and then we have the sigmoids {s1,s2,..,s8}
Indeed, until we have some idea how to fill in all these parameters we can't actually make use of this model/representation.

BTW, this model is in the physics tradition of if something is too complex, simplify it until you have something you can work with (at least as a starting point).
Let's give some simple examples, in this case, some simple logic:

d = a OR b OR c
[ d ] = [ BF[x1] ] [ 1 1 1 ] [ BF[x1] ] [ a ]
                             [ BF[x2] ] [ b ] 
                             [ BF[x3] ] [ c ]

d = a AND b AND c
[ d ] = [ BF[x1] ] [ 1/3 1/3 1/3 ] [ BF[x1] ] [ a ]
                                   [ BF[x2] ] [ b ] 
                                   [ BF[x3] ] [ c ]

d = a XOR b XOR c
[ d ] = [ XF[x1] ] [ 1 1 1 ] [ BF[x1] ] [ a ]
                             [ BF[x2] ] [ b ] 
                             [ BF[x3] ] [ c ]

Where BF[x] and XF[x] are sigmoids:

def binary_filter(x):
  if x <= 0.96:
    return 0
  else:
    return 1

def xor_filter(x):
  if 0.96 <= x and x <= 1.04:
    return 1
  else:
    return 0

In BKO, and the console, this is:

sa: -- d = a OR b OR c
sa: binary-filter to-number count-sum binary-filter (|a> + 0|b> + |c>)
| >                                          -- NB: this is equivalent to 1, and not the empty ket (look closely!)

sa: -- d = a AND b AND c
sa: binary-filter to-number 0.3333 count-sum binary-filter (|a> + |b> + |c>)
| >
sa: binary-filter to-number 0.3333 count-sum binary-filter (|a> + 0|b> + |c>)
0.000| >                                     -- NB: this is equivalent to 0.

sa: -- d = a XOR b XOR c
sa: xor-filter to-number count-sum binary-filter (0|a> + |b> + 0|c>)
| >
sa: xor-filter to-number count-sum binary-filter (|a> + |b> + 0|c>)
0.000| >
sa: xor-filter to-number count-sum binary-filter (|a> + |b>  + |c>)
0.000| >

OK. Let's try for a slightly more complex example:

f = (a AND b AND c) OR (d AND e)
[ f ] = [ BF[x1] ] [ 1 1 ] [ BF[x1] ] [ 1/3 1/3 1/3 0   0   ] [ BF[x1] ] [ a ]
                           [ BF[x2] ] [ 0   0   0   1/2 1/2 ] [ BF[x2] ] [ b ]
                                                              [ BF[x3] ] [ c ]
                                                              [ BF[x4] ] [ d ]
                                                              [ BF[x5] ] [ e ]

And we can do a version of set union and intersection too, but first we need some pieces:

def pos(x):               -- the simplest of the sigmoids.
  if x <= 0:
    return 0
  else:
    return x

abs(x) = pos(x) + pos(-x)
abs(a - b) = pos(a - b) + pos(-a + b)
a + b + abs(a - b) = 2*max(a,b)
a + b - abs(a - b) = 2*min(a,b)

eg:

[ r1 ]   [ 1  1  1 ] [ pos[x1] ] [  1  1 ] [ a ]
[ r2 ] = [ 1 -1 -1 ] [ pos[x2] ] [  1 -1 ] [ b ]
                     [ pos[x3] ] [ -1  1 ]
                     
ie:
r1 = a + b + pos(a - b) + pos(-a + b) = 2*max(a,b)
r2 = a + b - pos(a - b) - pos(-a + b) = 2*min(a,b)

And then we need the observation that max corresponds to a version of set union (even works for values other than 0 and 1), and min corresponds to intersection.

set-union(f,g):
  [max(f[k],g[k]) for k in range(len(f))]

set-intersection(f,g):
  [min(f[k],g[k]) for k in range(len(f))]

So finally, in MatSumSig, we have union and intersection (ignoring a factor of 2):

[ U1 ]   [ 1  1  1  0  0  0  0  0  0  0  0  0 ] [ pos[x1]  ] [  1  1  0  0  0  0  0  0 ] [ f1 ]
[ I1 ] = [ 1 -1 -1  0  0  0  0  0  0  0  0  0 ] [ pos[x2]  ] [  1 -1  0  0  0  0  0  0 ] [ g1 ]
[ U2 ]   [ 0  0  0  1  1  1  0  0  0  0  0  0 ] [ pos[x3]  ] [ -1  1  0  0  0  0  0  0 ] [ f2 ]
[ I2 ]   [ 0  0  0  1 -1 -1  0  0  0  0  0  0 ] [ pos[x4]  ] [  0  0  1  1  0  0  0  0 ] [ g2 ]
[ U3 ]   [ 0  0  0  0  0  0  1  1  1  0  0  0 ] [ pos[x5]  ] [  0  0  1 -1  0  0  0  0 ] [ f3 ]
[ I3 ]   [ 0  0  0  0  0  0  1 -1 -1  0  0  0 ] [ pos[x6]  ] [  0  0 -1  1  0  0  0  0 ] [ g3 ]
[ U4 ]   [ 0  0  0  0  0  0  0  0  0  1  1  1 ] [ pos[x7]  ] [  0  0  0  0  1  1  0  0 ] [ f4 ]
[ I4 ]   [ 0  0  0  0  0  0  0  0  0  1 -1 -1 ] [ pos[x8]  ] [  0  0  0  0  1 -1  0  0 ] [ g4 ]
                                                [ pos[x9]  ] [  0  0  0  0 -1  1  0  0 ]
                                                [ pos[x10] ] [  0  0  0  0  0  0  1  1 ]
                                                [ pos[x11] ] [  0  0  0  0  0  0  1 -1 ]
                                                [ pos[x12] ] [  0  0  0  0  0  0 -1  1 ]
ie:
[U1,U2,U3,U4] = 2* [max(f1,g1), max(f2,g2), max(f3,g3), max(f4,g4)]
[I1,I2,I3,I4] = 2* [min(f1,g1), min(f2,g2), min(f3,g3), min(f4,g4)]

Then finally, a simple version of my favourite toy: simm:

simm(w,f,g):
  sum(w[k]*min(f[k],g[k]) for k in range(len(f)))

[ r ] = [ sigmoid[x1] ] [ w1 w2 w3 w4 ] [ I1 ]
                                        [ I2 ]
                                        [ I3 ]
                                        [ I4 ]
                                        
where I think in this case it is assummed w_k >= 0

And all at once:

[ r ] = [ sigmoid[x1] ] [ w1 w2 w3 w4 ] [ pos[x1] ] [ 1 -1 -1  0  0  0  0  0  0  0  0  0 ] [ pos[x1]  ] [  1  1  0  0  0  0  0  0 ] [ f1 ]
                                        [ pos[x2] ] [ 0  0  0  1 -1 -1  0  0  0  0  0  0 ] [ pos[x2]  ] [  1 -1  0  0  0  0  0  0 ] [ g1 ]
                                        [ pos[x3] ] [ 0  0  0  0  0  0  1 -1 -1  0  0  0 ] [ pos[x3]  ] [ -1  1  0  0  0  0  0  0 ] [ f2 ]
                                        [ pos[x4] ] [ 0  0  0  0  0  0  0  0  0  1 -1 -1 ] [ pos[x4]  ] [  0  0  1  1  0  0  0  0 ] [ g2 ]
                                                                                           [ pos[x5]  ] [  0  0  1 -1  0  0  0  0 ] [ f3 ]
                                                                                           [ pos[x6]  ] [  0  0 -1  1  0  0  0  0 ] [ g3 ]
                                                                                           [ pos[x7]  ] [  0  0  0  0  1  1  0  0 ] [ f4 ]
                                                                                           [ pos[x8]  ] [  0  0  0  0  1 -1  0  0 ] [ g4 ]
                                                                                           [ pos[x9]  ] [  0  0  0  0 -1  1  0  0 ]
                                                                                           [ pos[x10] ] [  0  0  0  0  0  0  1  1 ]
                                                                                           [ pos[x11] ] [  0  0  0  0  0  0  1 -1 ]
                                                                                           [ pos[x12] ] [  0  0  0  0  0  0 -1  1 ]

Now, this can be called a space simm, but there is also a time based simm. I think it goes like this:

simm(w,f,g):
  sum(w[t]*min(f[t],g[t]) for t in range(len(f)))  -- k is space based, t is time based, but otherwise identical equation.

[ r ] = [ sum[x1,t2] ] [ sigmoid[x1,t1] ] [ 1 -1 -1 ] [ pos[x1] ] [  1  1 ] [ f ]
                                                      [ pos[x2] ] [  1 -1 ] [ g ]
                                                      [ pos[x3] ] [ -1  1 ]

ie, in words: a simm of f,g with respect to time.
[ sum[x1,t2] ] is the time based equivalent of [ w1 w2 w3 w4 ]
And the sigmoid[x1,t1] is meant to implement the idea that different times have different importance, but I'm not sure it is 100% correct as is.
And for the future: using sum[] with different time windows you can convert time based patterns into space based patterns. Details later.
Hrmm... the visual system in the brain most probably uses space simm, and the audio system uses time based simm. I mean, seems logical that that would be the case.
Hrmm... on further thought they most probably combine both. eg, image pixels is space simm, but changes over time is time simm. Something similar for audio.

2/5/2014 update: I guess the next example should be smooth and unsmooth.
First, smooth:

f[k] => f[k-1]/4 + f[k]/2 + f[k+1]/4

[ f0  ]       [ 3 1 0 0 0 0 0 0 0 0 0 ] [ f0  ]
[ f1  ]       [ 1 2 1 0 0 0 0 0 0 0 0 ] [ f1  ]
[ f2  ]       [ 0 1 2 1 0 0 0 0 0 0 0 ] [ f2  ]
[ f3  ]       [ 0 0 1 2 1 0 0 0 0 0 0 ] [ f3  ]
[ f4  ]       [ 0 0 0 1 2 1 0 0 0 0 0 ] [ f4  ]
[ f5  ] = 1/4 [ 0 0 0 0 1 2 1 0 0 0 0 ] [ f5  ]
[ f6  ]       [ 0 0 0 0 0 1 2 1 0 0 0 ] [ f6  ]
[ f7  ]       [ 0 0 0 0 0 0 1 2 1 0 0 ] [ f7  ]
[ f8  ]       [ 0 0 0 0 0 0 0 1 2 1 0 ] [ f8  ]
[ f9  ]       [ 0 0 0 0 0 0 0 0 1 2 1 ] [ f9  ]
[ f10 ]       [ 0 0 0 0 0 0 0 0 0 1 3 ] [ f10 ]

Now in BKO:

smooth |f0> => 0.75|f0> + 0.25|f1>
smooth |f1> => 0.25|f0> + 0.5|f1> + 0.25|f2>
smooth |f2> => 0.25|f1> + 0.5|f2> + 0.25|f3>
smooth |f3> => 0.25|f2> + 0.5|f3> + 0.25|f4>
smooth |f4> => 0.25|f3> + 0.5|f4> + 0.25|f5>
smooth |f5> => 0.25|f4> + 0.5|f5> + 0.25|f6>
smooth |f6> => 0.25|f5> + 0.5|f6> + 0.25|f7>
smooth |f7> => 0.25|f6> + 0.5|f7> + 0.25|f8>
smooth |f8> => 0.25|f7> + 0.5|f8> + 0.25|f9>
smooth |f9> => 0.25|f8> + 0.5|f9> + 0.25|f10>
smooth |f10> => 0.25|f9> + 0.75|f10>

Note that we have currency conservation. ie, sum of columns = 1 (taking into account the 1/4), and count-sum smooth |fk> = 1
Also, as you iterate the smooth rapidly becomes a Gaussian smooth. ie, spikes rapidly turn into smooth bell curves.
Note that the boundry effect from the smooth|f0> and smooth|f10> have a second effect of slowly flattening the curve, approaching a flat line, with a slight bump in the middle.
Anyway, in practice when I mapped posting times into minute buckets (ie 1440 buckets in a day), the best effect was with 300 to 500 smooths.
BTW, posting times is not the only case that has spikes needing to be smoothed to make use of the data.
Another for example is counting cars as they pass by your house. Again 1400 buckets that needs smoothing.
Heh. Tempted to buy a Raspberry Pi and actually try this ...

OK. Decided to check on "slowly flattening the curve, approaching a flat line".
With smooth restricted to |fk>, k in {0..10} we have:

sa: smooth^300 |f5>
0.091|f0> + 0.091|f1> + 0.091|f2> + 0.091|f3> + 0.091|f4> + 0.091|f5> + 0.091|f6> + 0.091|f7> + 0.091|f8> + 0.091|f9> + 0.091|f10>

sa: smooth^300 |f0>
0.091|f0> + 0.091|f1> + 0.091|f2> + 0.091|f3> + 0.091|f4> + 0.091|f5> + 0.091|f6> + 0.091|f7> + 0.091|f8> + 0.091|f9> + 0.091|f10>

sa: count-sum smooth^300 |f5>
|number: 1.0000000000000002>

sa: invert smooth^300 |f5>
11.000|f0> + 11.000|f1> + 11.000|f2> + 11.000|f3> + 11.000|f4> + 11.000|f5> + 11.000|f6> + 11.000|f7> + 11.000|f8> + 11.000|f9> + 11.000|f10>

sa: invert smooth^300 |f0>
10.954|f0> + 10.957|f1> + 10.965|f2> + 10.975|f3> + 10.987|f4> + 11.000|f5> + 11.013|f6> + 11.025|f7> + 11.036|f8> + 11.043|f9> + 11.047|f10>

sa: invert smooth^500 |f0>
10.999|f0> + 10.999|f1> + 10.999|f2> + 11.000|f3> + 11.000|f4> + 11.000|f5> + 11.000|f6> + 11.000|f7> + 11.001|f8> + 11.001|f9> + 11.001|f10>

Now, the brother (which I haven't actually played with much), unsmooth (yeah, needs a better name, though I guess we could call it a balanced discrete derivative):

f[k] => - f[k-1]/2 + f[k] - f[k+1]/2

[ f0  ]       [  1 -1  0  0  0  0  0  0  0  0  0 ] [ f0  ]
[ f1  ]       [ -1  2 -1  0  0  0  0  0  0  0  0 ] [ f1  ]
[ f2  ]       [  0 -1  2 -1  0  0  0  0  0  0  0 ] [ f2  ]
[ f3  ]       [  0  0 -1  2 -1  0  0  0  0  0  0 ] [ f3  ]
[ f4  ]       [  0  0  0 -1  2 -1  0  0  0  0  0 ] [ f4  ]
[ f5  ] = 1/2 [  0  0  0  0 -1  2 -1  0  0  0  0 ] [ f5  ]
[ f6  ]       [  0  0  0  0  0 -1  2 -1  0  0  0 ] [ f6  ]
[ f7  ]       [  0  0  0  0  0  0 -1  2 -1  0  0 ] [ f7  ]
[ f8  ]       [  0  0  0  0  0  0  0 -1  2 -1  0 ] [ f8  ]
[ f9  ]       [  0  0  0  0  0  0  0  0 -1  2 -1 ] [ f9  ]
[ f10 ]       [  0  0  0  0  0  0  0  0  0 -1  1 ] [ f10 ]

Note that we don't have currency conservation. Sum of columms = 0.
I wonder what the time based equivalents ot smooth and unsmooth are?

11/5/2014 update: we can easily represent inhibitory signals in MatSumSig.
eg: simply enough:

[ filtered-signal ] = [ pos[x1] ] [ 1 -1 ] [ signal      ]
                                           [ off-current ]
where:
signal is a time varying signal.
off-current is a time varying off-current (ie, an inhibitory signal of roughly the same strength as the signal)
filtered-signal is the result

[ filtered-signal ] = [ pos[x1] ] [ 1 -10 ] [ signal      ]
                                            [ off-current ]
where off-current is a strongly inhibitory signal

[ filtered-signal ] = [ pos[x1] ] [ 1 -0.2 ] [ signal      ]
                                             [ off-current ]
where off-current is a weakly inhibitory signal

Of course, this also means it takes "currency" to switch off a signal. eg, intrusive thoughts you can't quite mentally switch off.

OK. Now, let's try and explain function matrices. They are really quite simple, but useful for my MSS model.
eg let's start with a simple one:

[ d ]   [ fn1[x1] ] [ a ]
[ e ] = [ fn2[x2] ] [ b ]
[ f ]   [ fn3[x3] ] [ c ]

which expands to:

d = fn1[a]
e = fn2[b]
f = fn3[c]

OK. So we work from the right, inserting the values from the applied vector to the respective x_i. (here x1 = a, x2 = b, x3 = c)
And the x_i don't have to be in order (as they are in this simple case), and we can take more than one parameter.
eg:

[ d ]   [ bah1[x3]       ] [ a ]
[ e ] = [ bah2[x2,x1]    ] [ b ]
[ f ]   [ bah3[x1,x2,x3] ] [ c ]

which expands to:

d = bah1[c]
e = bah2[b,a]
f = bah3[a,b,c]

And more interestingly, the functions in the function matrices can have "stored data" (in this case L_i).
eg:

[ d ]   [ foo[L1,x] ] [ a ]
[ e ] = [ foo[L2,x] ] [ b ]
[ f ]   [ foo[L3,x] ] [ c ]

which expands to:

d = foo[L1,(a,b,c)]                 -- NB: x_i are elements, x is the vector (x1,x2,...,xn)
e = foo[L2,(a,b,c)]
f = foo[L3,(a,b,c)]

For example, if we set:

L1 = (m1,m2,m3)
L2 = (m4,m5,m6)
L3 = (m7,m8,m9)

and

foo[u,v] = dot-product(u,v)

then:

[ d ]   [ foo[L1,x] ] [ a ]
[ e ] = [ foo[L2,x] ] [ b ]
[ f ]   [ foo[L3,x] ] [ c ]

expands to a standard matrix:

[ d ]   [ m1 m2 m3 ] [ a ]
[ e ] = [ m4 m5 m6 ] [ b ]
[ f ]   [ m7 m8 m9 ] [ c ]

ie:

d = m1*a + m2*b + m3*c
e = m4*a + m5*b + m6*c
f = m7*a + m8*b + m9*c

And that's about it.
I guess I should mention what should be obvious, is that you can mix and match fn-matrices with standard matrices (cf, the MSS examples above).
eg:

[ d ]   [ foo1[x1] ] [ 5 6 1 0 2 ] [ fn1[x1]  ] [ m1 m2 m3 ] [ a ]
[ e ] = [ foo1[x2] ] [ 8 8 7 2 1 ] [ fn2[x3]  ] [ m4 m5 m6 ] [ b ]
[ f ]   [ foo2[x1] ]               [ fn3[x2]  ] [ m7 m8 m9 ] [ c ]
[ g ]   [ foo2[x2] ]               [ bah1[x1] ]
                                   [ bah2[x3] ]

It is going to be ugly, but let's show how we expand this down:

[ d ]   [ foo1[x1] ] [ 5 6 1 0 2 ] [ fn1[x1]  ] [ m1*a + m2*b + m3*c ]
[ e ] = [ foo1[x2] ] [ 8 8 7 2 1 ] [ fn2[x3]  ] [ m4*a + m5*b + m6*c ] 
[ f ]   [ foo2[x1] ]               [ fn3[x2]  ] [ m7*a + m8*b + m9*c ]
[ g ]   [ foo2[x2] ]               [ bah1[x1] ]
                                   [ bah2[x3] ]

[ d ]   [ foo1[x1] ] [ 5 6 1 0 2 ] [ fn1[m1*a + m2*b + m3*c]  ]
[ e ] = [ foo1[x2] ] [ 8 8 7 2 1 ] [ fn2[m7*a + m8*b + m9*c]  ] 
[ f ]   [ foo2[x1] ]               [ fn3[m4*a + m5*b + m6*c]  ]
[ g ]   [ foo2[x2] ]               [ bah1[m1*a + m2*b + m3*c] ]
                                   [ bah2[m7*a + m8*b + m9*c] ]

[ d ]   [ foo1[x1] ] [ 5*fn1[m1*a + m2*b + m3*c] + 6*fn2[m7*a + m8*b + m9*c] + 1*fn3[m4*a + m5*b + m6*c] + 0*bah1[m1*a + m2*b + m3*c] + 2*bah2[m7*a + m8*b + m9*c] ]
[ e ] = [ foo1[x2] ] [ 8*fn1[m1*a + m2*b + m3*c] + 8*fn2[m7*a + m8*b + m9*c] + 7*fn3[m4*a + m5*b + m6*c] + 2*bah1[m1*a + m2*b + m3*c] + 1*bah2[m7*a + m8*b + m9*c] ] 
[ f ]   [ foo2[x1] ]               
[ g ]   [ foo2[x2] ]               
                                   
d = foo1[5*fn1[m1*a + m2*b + m3*c] + 6*fn2[m7*a + m8*b + m9*c] + 1*fn3[m4*a + m5*b + m6*c] + 0*bah1[m1*a + m2*b + m3*c] + 2*bah2[m7*a + m8*b + m9*c]]
e = foo1[8*fn1[m1*a + m2*b + m3*c] + 8*fn2[m7*a + m8*b + m9*c] + 7*fn3[m4*a + m5*b + m6*c] + 2*bah1[m1*a + m2*b + m3*c] + 1*bah2[m7*a + m8*b + m9*c]]
f = foo2[5*fn1[m1*a + m2*b + m3*c] + 6*fn2[m7*a + m8*b + m9*c] + 1*fn3[m4*a + m5*b + m6*c] + 0*bah1[m1*a + m2*b + m3*c] + 2*bah2[m7*a + m8*b + m9*c]]
g = foo2[8*fn1[m1*a + m2*b + m3*c] + 8*fn2[m7*a + m8*b + m9*c] + 7*fn3[m4*a + m5*b + m6*c] + 2*bah1[m1*a + m2*b + m3*c] + 1*bah2[m7*a + m8*b + m9*c]]

Heh. The idea of layers and layers of matrices and fn-matrices is actually a tidy little model of how computation in the brain works (kind of the point of the MatSumSig model).
Though where a fn-matrix layer of sigmoids corresponds to a single layer of processing by synapses, more general fn-matrices can be used as a compact representation of many layers of computation by the brain/neural circuts.

18/5/2014 update: Note that almost always fn matrices are not invertable (unlike regular matrices).
ie, given the ouput result, it is impossible to reconstruct the input (except by brute force).
eg, as a hand-wavey proof of this, just consider the function layer to be secure-hash, and then:

[ y1 ] = [ secure-hash[x1] ] [ a b ] [ x1 ]
[ y2 ]   [ secure-hash[x2] ] [ c d ] [ x2 ]

y1 = secure-hash[a*x1 + b*x2]
y2 = secure-hash[c*x2 + d*x2]

23/4/2014: a little more on mapping BKO rules to English. In this case implementing the idea of random greetings.

----------------------------------------
|context> => |context: greetings play>

hello |*> #=> merge-labels(|Hello, > + |_self> + |!>)
hey |*> #=> merge-labels(|Hey Ho! > + |_self> + |.>)
wat-up |*> #=> merge-labels (|Wat up my homie! > + |_self> + | right?>)
greetings |*> #=> merge-labels(|Greetings fine Sir. I believe they call you > + |_self> + |.>)
howdy |*> #=> merge-labels(|Howdy partner!>)
good-morning |*> #=> merge-labels(|Good morning > + |_self> + |.>)
gday |*> #=> merge-labels(|G'day > + |_self> + |.>)
random-greet |*> #=> pick-elt ( hello |_self> + hey |_self> + wat-up |_self> + greetings |_self> + howdy |_self> + good-morning |_self> + gday |_self>)
friends-list |*> #=> extract-value list-to-words extract-value friends |_self>

friends |Charlie> => |Jack> + |Emma>

friends |Sam> => |Charlie> + |George> + |Emma> + |Jack> + |Rober> + |Frank> + |Julie>
----------------------------------------

sa: random-greet |Matt>
fn:   pick-elt                         -- these three lines are some debugging info.
len:  1                                -- and shows the possible greetings it can choose from:
sp:  |Hello, Matt!> + |Hey Ho! Matt.> + |Wat up my homie! Matt right?> + |Greetings fine Sir. I believe they call you Matt.> + |Howdy partner!> + |Good morning Matt.> + |G'day Matt.>
|Good morning Matt.>

sa: random-greet |George>
|Wat up my homie! George right?>

sa: random-greet friends-list |Charlie>
fn:   pick-elt
len:  1
sp:  |Hello, Jack and Emma!> + |Hey Ho! Jack and Emma.> + |Wat up my homie! Jack and Emma right?> + |Greetings fine Sir. I believe they call you Jack and Emma.> + |Howdy partner!> + |Good morning Jack and Emma.> + |G'day Jack and Emma.>
|Hey Ho! Jack and Emma.>

sa: random-greet friends-list |Sam>
fn:   pick-elt
len:  1
sp:  |Hello, Charlie, George, Emma, Jack, Rober, Frank and Julie!> + |Hey Ho! Charlie, George, Emma, Jack, Rober, Frank and Julie.> + |Wat up my homie! Charlie, George, Emma, Jack, Rober, Frank and Julie right?> + |Greetings fine Sir. I believe they call you Charlie, George, Emma, Jack, Rober, Frank and Julie.> + |Howdy partner!> + |Good morning Charlie, George, Emma, Jack, Rober, Frank and Julie.> + |G'day Charlie, George, Emma, Jack, Rober, Frank and Julie.>
|G'day Charlie, George, Emma, Jack, Rober, Frank and Julie.>

24/4/2014: You know, if we drop the requirement for everything to be kets (at the loss of some power), there is actually a tidy little language underneath here.
Let's show the mapping between BKO and it:

fib |0> => |0>
fib |1> => |1>
n-1 |*> #=> arithmetic(|_self>,|->,|1>)
n-2 |*> #=> arithmetic(|_self>,|->,|2>)
fib |*> #=> arithmetic( fib n-1 |_self>, |+>, fib n-2 |_self>)
fib-ratio |*> #=> arithmetic( fib |_self> , |/>, fib n-1 |_self> )

becomes:

fib 0 = 0
fib 1 = 1
n-1 * = _self - 1
n-2 * = _self - 2
fib * = fib n-1 _self + fib n-2 _self
fib-ratio * = fib _self / fib n-1 _self

fact |0> => |1>
n-1 |*> #=> arithmetic(|_self>,|->,|1>)
fact |*> #=> arithmetic( |_self>, |*>, fact n-1 |_self>)

becomes:

fact 0 = 1
n-1 * = _self - 1
fact * = _self * fact n-1 _self

ave |*> #=> arithmetic(count-sum "" |_self>,|/>,count "" |_self>)

becomes:

ave * = count-sum "" _self / count "" _self       -- Not sure this works, as it needs to mix labels with numbers to work.

hello |*> #=> merge-labels(|Hello, > + |_self> + |!>)
hey |*> #=> merge-labels(|Hey Ho! > + |_self> + |.>)
wat-up |*> #=> merge-labels (|Wat up my homie! > + |_self> + | right?>)
greetings |*> #=> merge-labels(|Greetings fine Sir. I believe they call you > + |_self> + |.>)
howdy |*> #=> merge-labels(|Howdy partner!>)
good-morning |*> #=> merge-labels(|Good morning > + |_self> + |.>)
gday |*> #=> merge-labels(|G'day > + |_self> + |.>)
random-greet |*> #=> pick-elt ( hello |_self> + hey |_self> + wat-up |_self> + greetings |_self> + howdy |_self> + good-morning |_self> + gday |_self>)
friends-list |*> #=> extract-value list-to-words extract-value friends |_self>

becomes:

hello * = "Hello, ${_self}!"
hey * = "Hey Ho! ${_self}."
wat-up * = "Wat up my homie! ${_self} right?"
greetings * = "Greetings fine Sir. I believe they call you ${_self}."
howdy * = "Howdy partner!"
good-morning * = "Good morning ${_self}."
gday * = "G'day ${_self}."
random-greet * = pick-elt [ hello _self, hey _self, wat-up _self, greetings _self, howdy _self, good-morning _self, gday _self]
friends-list * = extract-value list-to-words extract-value friends _self

3/5/2014 update: OK. Maybe introduce notation for merge_labels to try and tidy it up without having to give up kets?

hello |*> #=> merge-labels(|Hello, > + |_self> + |!>)
hey |*> #=> merge-labels(|Hey Ho! > + |_self> + |.>)
wat-up |*> #=> merge-labels (|Wat up my homie! > + |_self> + | right?>)
greetings |*> #=> merge-labels(|Greetings fine Sir. I believe they call you > + |_self> + |.>)
howdy |*> #=> merge-labels(|Howdy partner!>)
good-morning |*> #=> merge-labels(|Good morning > + |_self> + |.>)
gday |*> #=> merge-labels(|G'day > + |_self> + |.>)
random-greet |*> #=> pick-elt ( hello |_self> + hey |_self> + wat-up |_self> + greetings |_self> + howdy |_self> + good-morning |_self> + gday |_self>)
friends-list |*> #=> extract-value list-to-words extract-value friends |_self>

becomes:

hello |*> #=> |Hello, > _ |_self> _ |!>
hey |*> #=> |Hey Ho! > _ |_self> _ |.>
wat-up |*> #=> |Wat up my homie! > _ |_self> _ | right?>
greetings |*> #=> |Greetings fine Sir. I believe they call you > _ |_self> _ |.>
howdy |*> #=> |Howdy partner!>
good-morning |*> #=> |Good morning > _ |_self> _ |.>
gday |*> #=> |G'day > _ |_self> _ |.>
random-greet |*> #=> pick-elt ( hello |_self> + hey |_self> + wat-up |_self> + greetings |_self> + howdy |_self> + good-morning |_self> + gday |_self>)
friends-list |*> #=> extract-value list-to-words extract-value friends |_self>

28/4/2014: So today I was digging through my backups looking for the code for "categorize values", which I will mention here soon.
Anyway, in the process I discovered a very early shell script attempt at a very simple bot.
Code here.

OK. So I think it is time to explain the derivation of my first version of simm. Again, mathematically it is quite simple, but useful.
First, define a multiplication operator:

a*b = \Sum_k abs(a_k . b_k)     -- discrete version, where . is the standard multiplication operator.
a*b = \Int dt abs(a(t) . b(t))  -- continuous version.

Both of which have the property:

0 <= w*[f - g] <= w*f + w*g
where 0 if f == g, and w*f + w*g if f and g are perfectly disjoint.

Which is just a standard property of metrics. See WP.

2. d(x,y) = 0 if and only if x = y
4. d(x,z) <= d(x,y) + d(y,z)

OK. So we can normalize the range of this simm to [0,1] by simply dividing by w*f + w*g:

0 <= w*[f - g]/(w*f + w*g) <= 1

OK. So this is a good start, but we find for some examples it doesn't work as well as expected (details later!).
But we can fix this by adding a term: R abs(w*f - w*g), R >= 0, and seems to work best with R = 1.
And hence simm:

0 <= (w*[f - g] + R abs(w*f - w*g))/(w*f + w*g + R abs(w*f - w*g)) <= 1

Next, set R = 1, and note that: a + b + abs(a - b) = 2*max(a,b), so now we have:

0 <= (w*[f - g] + abs(w*f - w*g))/2.max(w*f,w*g) <= 1

And in the physics tradition has some symmetries:

given the scalar k (not equal to 0), we have:
1) symmetry under w => k.w                       
2) symmetry under f => k.f and g => k.g
3) symmetry under w => w.exp(i*t)/R(t), f => R(t).exp(-i*t).f, g => R(t).exp(-i*t).g

Now, I don't know a use for this last one, but I put it in because it reminds me of gauge transformations in physics, and is a good decider on what terms are valid to use in our metric.
ie, w*f, w*g, w*[f - g], abs(w*f - w*g)
The final piece is that sometimes we want to swap from 0 for perfect match, 1 for perfectly mismatch, to 1 for perfect match, and 0 for perfectly mismatch.
Anyway, simple matter of (presuminng m "attains" its upper and lower bounds):

0 <= m <= 1

becomes:

1 >= 1 - m >= 0

ie:

(w*[f - g] + R abs(w*f - w*g))/(w*f + w*g + R abs(w*f - w*g))

becomes:

(w*f + w*g + R abs(w*f - w*g) - w*[f - g] - R abs(w*f - w*g))/(w*f + w*g + R abs(w*f - w*g))
  = (w*f + w*g - w*[f - g])/(w*f + w*g + R abs(w*f - w*g)) 
  = (w*f + w*g - w*[f - g])/2.max(w*f,w*g)                -- assuming R = 1

So there we have it:

simm(w,f,g) = (w*f + w*g - w*[f - g])/2.max(w*f,w*g)

NB: if w is not given, ie, simm(f,g), then assume w_k = 1 for all k, or w(t) = 1 for all t.

Now a couple of observations.
1) simm(w,f,g) gives the best results when f and g are "rescaled" so that w*f == w*g (excluding the case where they are length 1 vectors).
2) if f_k and g_k are >= 0 for all terms, then the equation can be compressed, using a + b - abs(a - b) = 2*min(a,b):

simm(w,f,g) = \Sum_k w_k min(f_k , g_k) / max(w*f,w*g)

And then these motivate the superposition versions of simm:

def weighted_simm(w,A,B):
  A = multiply(w,A)
  B = multiply(w,B)
  return intersection(A.normalize(),B.normalize()).count_sum()
  
def simm(A,B):
  return intersection(A.normalize(),B.normalize()).count_sum()

where:

A.normalize(), B.normalize() implement the idea of rescaling so that w*f == w*g, and also max(w*f,w*g) = 1
intersection(...) is equivalent to the min(f,g) term.

17/6/2014 update: Here is the unscaled version of simm (ie, w*f not necessarily equal to w*g):

def unscaled_simm(A,B):
  wf = A.count_sum()
  wg = B.count_sum()
  if wf == 0 and wg == 0:
    return 0
  return intersection(A,B).count_sum()/max(wf,wg)

Now, how do we handle the case of length 1 vectors not being rescaled (otherwise simm(0.3|a>,500|a>) == 1)?
Well, we have to do a little work:

def silent_simm(A,B):
# handle single kets, where we don't want rescaling to s1*wf == s2*wg
  if A.count() <= 1 and B.count() <= 1:
    a = A.ket()
    b = B.ket()
    if a.label != b.label:
      return 0
    a = max(a.value,0)                        # just making sure they are >= 0.
    b = max(b.value,0)
    if a == 0 and b == 0:                     # prevent div by zero.
      return 0
    return min(a,b)/max(a,b)
# default case:
  return intersection(A.normalize(),B.normalize()).count_sum()

Next, since we have a simm, we can now implement the Landscape function:

L(f,x) = simm(f,g(x))
f is an input pattern
there are different patterns g(x) at each point x, and in general x is not continuous.

So the Landscape function neatly maps an incoming pattern into a mathematical surface.
eg: DNA micro-arrays (so yeah, the strength with which different DNA strands stick to each other has some similarity to simm).

12/5/2014 update: Now for a diversion. Here is a variant of simm that I have never actually implemented. Not sure if it is even useful.
First, a couple of definitions:

Define Jp as the set of p'th roots of unity. 
ie, exp(i*2*Pi*k/p)
And give individual roots the names j_pk. eg, J3 = {j_31,j_32,j_33}
The useful fact is that they add up to 0.
j_21 + j_22 = 0 (ie 1 + -1 = 0)
j_31 + j_32 + j_33 = 0
j_41 + j_42 + j_43 + j_44 = 0
and so on.

And just like above, we have:
0 <= w*[j_21 f1 + j_22 f2] <= w*f1 + w*f2
0 <= w*[j_31 f1 + j_32 f2 + j_33 f3] <= w*f1 + w*f2 + w*f3
0 <= w*[j_41 f1 + j_42 f2 + j_43 f3 + j_44 f4] <= w*f1 + w*f2 + w*f3 + w*f4
and so on.

And also like above, we have:
a + b + abs(a - b) = 2*max(a,b)
a1 + a2 + a3 + abs(j_31 a1 + j_32 a2 + j_33 a3) approx-equal 3*max(a1,a2,a3)
a1 + a2 + a3 + a4 + abs(j_41 a1 + j_42 a2 + j_43 a3 + j_44 a4) approx-equal 4*max(a1,a2,a3,a4)
and so on.

The point is, we now have a family of simm's:
simm(w,f,g) = (w*f + w*g - w*[f - g])/2.max(w*f,w*g)

simm(w,f1,f2) = (w*f1 + w*f2 - w*[j_21 f1 + j_22 f2])/2.max(w*f1,w*f2)
simm(w,f1,f2,f3) = (w*f1 + w*f2 + w*f3 - w*[j_31 f1 + j_32 f2 + j_33 f3])/3.max(w*f1,w*f2,w*f3)
simm(w,f1,f2,f3,f4) = (w*f1 + w*f2 + w*f3 + w*f4 - w*[j_41 f1 + j_42 f2 + j_43 f3 + j_44 f4])/4.max(w*f1,w*f2,w*f3,w*f4)
simm(w,f1,f2,f3,f4,f5) = (w*f1 + w*f2 + w*f3 + w*f4 + w*f5 - w*[j_51 f1 + j_52 f2 + j_53 f3 + j_54 f4 + j_55 f5])/5.max(w*f1,w*f2,w*f3,w*f4,w*f5)
and so on.

Which presumably can be shrunk down to this:
simm(w,f,g) = \Sum_k w_k min(f_k , g_k) / max(w*f,w*g)

simm(w,f1,f2) = \Sum_k w_k min(f1_k,f2_k)/max(w*f1,w*f2)
simm(w,f1,f2,f3) = \Sum_k w_k min(f1_k,f2_k,f3_k)/max(w*f1,w*f2,w*f3)
simm(w,f1,f2,f3,f4) = \Sum_k w_k min(f1_k,f2_k,f3_k,f4_k)/max(w*f1,w*f2,w*f3,w*f4)
simm(w,f1,f2,f3,f4,f5) = \Sum_k w_k min(f1_k,f2_k,f3_k,f4_k,f5_k)/max(w*f1,w*f2,w*f3,w*f4,w*f5)
and so on.

OK. That doesn't look too hard to implement in python at least. More work in MatSumSig though. We presumably need quite a few layers of processing.

29/4/2014: Since we are typing up the foundations, next on the list is the normed frequency equation.
The starting point is the frequency class equation as found on WP.

N = floor(1/2 - log_2(frequency-of-this-item/frequency-of-most-common-item))

In python this is:

N = math.floor(0.5 - math.log(current/largest,2))

Then I decided to normalize it so like simm it is in [0,1], 1 for exact match, 0 for not in set.

# e is a ket, X is a superposition
# for best effect X should be a frequency list
def normed_frequency_class(e,X):
  X = X.drop()                                       -- drop elements with coeff <= 0
  smallest = X.find_min_coeff()                      -- return the min coeff in X as float
  largest = X.find_max_coeff()                       -- return the max coeff in X as float
  f = X.find_value(e)                                -- return the value of ket e in superposition X as float

  if largest <= 0 or f <= 0:                         -- otherwise the math.log() blows up!
    return 0

  fc_max = math.floor(0.5 - math.log(smallest/largest,2)) + 1  -- NB: the + 1 is important, else the smallest element in X gets reported as not in set.
  return 1 - math.floor(0.5 - math.log(f/largest,2))/fc_max

nfc can be considered a type of fuzzy set membership function.
If all coeffs in X are equal, it gives 1 for membership, and 0 for non-membership
If the coeffs are not all equal then it has fuzzier properties.
Also note that the "f/largest" is a ratio, so only the relative values matter, not the absolute value.

13/5/2014: Now, let's show the code (in new_context class) where we put simm and nfc to use:
(BTW, it might be possible to write versions of these two that make use of parallelization, especially if self.known_kets is large)

  def pattern_recognition(self,pattern,op="pattern",t=0):
    op = op.label.split("op: ")[-1] if type(op) == ket else op

    result = superposition()
    for x in self.known_kets:
      if op in self.rules[x]:
        candidate_pattern = self.recall(op,x)                        # do we need active=True here?
        value = silent_simm(pattern,candidate_pattern)               # probably.
        if value > t:                                                # Also a few spots below.
          result.data.append(ket(x,value))                           # eg, what happens if candidate_pattern is
    return result.coeff_sort()                                       # not ket/sp? 
    
  def map_to_topic(self,e,op,t=0):
    # do we need the op = op.label.split("op: ... stuff here?
    result = superposition()
    for x in self.known_kets:
      if op in self.rules[x]:
        frequency_list = self.recall(op,x)                            # do we need active=True here?
        value = normed_frequency_class(e,frequency_list)
        if value > t:
          result.data.append(ket(x,value))
    return result.normalize(100).coeff_sort()                         # NB: .normalize(100) is a key component of this function.
                                                                      # Half the magic is in nfc(), the other half in normalize(100).
                                                                      # eg, say "foo" is a long way down the frequency list for some object.
                                                                      # so its nfc() is small. But if it is not in any other frequency list,
                                                                      # then we want it 100% match to that one frequency list.

Then in the ket class (not yet in superposition that I know of), we have:

# do we need a superposition version of this? Probably...
# implements: similar[op] |x>
  def similar(self,context,op):
    f = self.apply_op(context,op)            # use apply_op or context.recall() directly?
    print("f:",f.display())                  # in light of active=True thing, apply_op() seems the right answer.
#    return context.pattern_recognition(f,op) # yeah, but what about in pat_rec?
    return context.pattern_recognition(f,op).delete_ket(self)    # we delete self, ie |x>, from the result, since it is always a 100% match anyway.

# implements: find-topic[op] |x> 
  def find_topic(self,context,op):           
    return context.map_to_topic(self,op)

23/5/2014 update: Here are some nice results using find-topic. In this case guessing if a name is last, male or female, using this data set.
Though it is perfectly general, and will work similarly for any collection of frequency lists.
eg, mapping names to country of origin should work fine (eg, "that's an Italian name", or "that's an Irish name"), if I could get hold of the data.

1/5/2014: OK. A quick play with a prolog example on WP:
First the prolog:

mother_child(trude, sally).
 
father_child(tom, sally).
father_child(tom, erica).
father_child(mike, tom).
 
sibling(X, Y)      :- parent_child(Z, X), parent_child(Z, Y).
 
parent_child(X, Y) :- father_child(X, Y).
parent_child(X, Y) :- mother_child(X, Y).

This results in the following query being evaluated as true:

 ?- sibling(sally, erica).
 Yes

the query ?- sibling(sally, sally). also succeeds

Now, in BKO:

|context> => |context: prolog example>

mother |sally> => |trude>
child |trude> = > |sally>

father |sally> => |tom>
child |tom> => |sally>

father |erica> => |tom>
child |tom> +=> |erica>

father |tom> => |mike>
child |mike> => |tom>

parent |*> #=> mother |_self> + father |_self>
sibling |*> #=> child parent |_self>          -- this being the BKO equivalent of: sibling(X, Y) :- parent_child(Z, X), parent_child(Z, Y) 
sibling-clean |*> #=> drop (child parent |_self> + -|_self>)

-- now put it to use:
sa: sibling |sally>
|sally> + |erica>

sa: sibling |erica>
|sally> + |erica>

sa: sibling-clean |sally>
|erica>

sa: sibling-clean |erica>
|sally>

-- applying bra's is not yet implemented, but if it was then:
-- "is erica a sibling of sally?"
-- would map to:
sa: <erica|sibling|sally> == 1

OK. Here is a weird observation that occured to me today. We can use the category, sub-category notation in ket-labels to represent lists of symbols.

|a: b: c: d: e> would correspond to the list ["a","b","c","d","e"].

And use merge-labels to pre/append lists:
sa: merge-labels( |x: y: z> + |: > + |a: b: c: d: e>)
|x: y: z: a: b: c: d: e>

sa: merge-labels(|a: b: c: d: e> + |: > + |x: y: z>)
|a: b: c: d: e: x: y: z>

-- and then cf CAR and CDR in LISP:
sa: extract-category |a: b: c: d: e: f>
|a: b: c: d: e>

sa: extract-category extract-category |a: b: c: d: e: f>
|a: b: c: d>

sa: extract-category^4 |a: b: c: d: e: f>
|a: b>

sa: extract-value |a: b: c: d: e: f>
|f>

sa: extract-value extract-category |a: b: c: d: e: f>
|e>

sa: extract-value extract-category^4 |a: b: c: d: e: f>
|b>

Indeed, in python we have:
>>> "a: b: c: d: e".split(": ")
['a', 'b', 'c', 'd', 'e']

Today, a quick example of using linearity of operators to create a temperature conversion table.
Where F maps to Fahrenheit, and C to Celsius.

sa: range(|C: 0>,|C: 100>,|10>)
|C: 0> + |C: 10> + |C: 20> + |C: 30> + |C: 40> + |C: 50> + |C: 60> + |C: 70> + |C: 80> + |C: 90> + |C: 100>

sa: F range(|C: 0>,|C: 100>,|10>)
|F: 32.00> + |F: 50.00> + |F: 68.00> + |F: 86.00> + |F: 104.00> + |F: 122.00> + |F: 140.00> + |F: 158.00> + |F: 176.00> + |F: 194.00> + |F: 212.00>

sa: range (|F: 0>,|F: 100>,|10>)
|F: 0> + |F: 10> + |F: 20> + |F: 30> + |F: 40> + |F: 50> + |F: 60> + |F: 70> + |F: 80> + |F: 90> + |F: 100>

sa: C range (|F: 0>,|F: 100>,|10>)
|C: -17.78> + |C: -12.22> + |C: -6.67> + |C: -1.11> + |C: 4.44> + |C: 10.00> + |C: 15.56> + |C: 21.11> + |C: 26.67> + |C: 32.22> + |C: 37.78>

6/5/2014: A note on general pattern recognition.
First, let's give my definition:
We define pattern recognition as initially a pairwise operation.

pat-rec(x,x) = 
...

So I already have it working for text documents.
The general idea is find a mapping from object => well-behaved deterministic distinctive superposition.
where:

well-behaved means similar objects return similar superpositions (this is the hard bit to achieve, but hopefully not impossible)
deterministic means if you feed in the same object, you get essentially the same superposition. There is some lee-way in that it doesn't have to be 100% identical on each run, but close.
distinctive means different object types have easily distinguishable superpositions (again, this is on the hard side)

Interestingly enough, the superposition can be pretty much anything, provided it is "well-behaved", "deterministic" and "distinctive".
Indeed, in general given a superposition resulting from this, it is essentially impossible to find the generating object, probably apart from brute-force.
I was motivated by the idea of DNA electrophoresis
Split a document into pieces, count them, then run simm on that.
A couple of slashdot examples:

sa: load fragment-documents-3.sw
loading sw file: sw-examples/fragment-documents-3.sw

sa: dump |slashdot-1>

supported-ops |slashdot-1> => |op: fragment-lengths> + |op: fragment-hash>
fragment-lengths |slashdot-1> => 1807.000|0> + 500.000|2> + 346.000|5> + 209.000|4> + 165.000|3> + 134.000|7> + 128.000|8> + 86.000|6> + 58.000|21> + 51.000|41> + 45.000|13> + 44.000|17> + 43.000|20> + 43.000|29> + 42.000|16> + 39.000|24> + 39.000|36> + 34.000|11> + 30.000|32> + 27.000|1> + 26.000|55> + 26.000|31> + 25.000|19> + 25.000|9> + 23.000|43> + 23.000|37> + 23.000|33> + 22.000|65> + 22.000|48> + 21.000|10> + 20.000|25> + 19.000|14> + 19.000|93> + 19.000|38> + 18.000|18> + 18.000|39> + 17.000|70> + 17.000|46> + 17.000|23> + 16.000|80> + 16.000|22> + 16.000|131> + 14.000|30> + 13.000|72> + 13.000|74> + 13.000|42> + 13.000|34> + 13.000|27> + 11.000|26> + 10.000|15> + 10.000|73> + 10.000|75> + 10.000|76> + 9.000|59> + 8.000|12> + 8.000|68> + 8.000|47> + 8.000|57> + 8.000|77> + 7.000|54> + 7.000|50> + 7.000|35> + 6.000|69> + 5.000|94> + 5.000|81> + 5.000|88> + 5.000|63> + 5.000|62> + 5.000|61> + 5.000|40> + 5.000|44> + 5.000|87> + 4.000|97> + 4.000|95> + 4.000|45> + 4.000|49> + 4.000|111> + 4.000|113> + 4.000|115> + 4.000|53> + 4.000|116> + 4.000|109> + 4.000|107> + 4.000|139> + 4.000|135> + 3.000|228> + 3.000|146> + 3.000|120> + 3.000|124> + 3.000|71> + 3.000|179> + 3.000|52> + 3.000|51> + 3.000|181> + 3.000|100> + 3.000|28> + 2.000|128> + 2.000|91> + 2.000|154> + 2.000|156> + 2.000|150> + 2.000|119> + 2.000|157> + 2.000|85> + 2.000|84> + 2.000|83> + 2.000|82> + 2.000|141> + 2.000|166> + 2.000|421> + 2.000|168> + 2.000|167> + 2.000|160> + 2.000|60> + 2.000|199> + 2.000|110> + 2.000|112> + 2.000|114> + 2.000|104> + 2.000|402> + 2.000|189> + 2.000|188> + 2.000|103> + 2.000|102> + 2.000|101> + 2.000|415> + 2.000|86> + 2.000|99> + |220> + |121> + |1851> + |126> + |222> + |231> + |232> + |159> + |90> + |155> + |117> + |153> + |118> + |477> + |242> + |149> + |3346> + |143> + |147> + |145> + |144> + |256> + |250> + |450> + |685> + |78> + |79> + |174> + |175> + |172> + |390> + |337> + |332> + |262> + |260> + |695> + |3148> + |165> + |169> + |66> + |64> + |327> + |324> + |3435> + |772> + |198> + |431> + |430> + |191> + |193> + |197> + |1321> + |58> + |56> + |180> + |208> + |1336> + |383> + |206> + |207> + |203> + |105> + |1520> + |379> + |417> + |419> + |148> + |2008> + |495> + |138> + |133> + |130> + |136>
fragment-hash |slashdot-1> => 1808.000|09> + 332.000|aa> + 254.000|79> + 149.000|4d> + 88.000|91> + 66.000|83> + 54.000|d9> + 35.000|bd> + 33.000|14> + 33.000|2d> + 29.000|d4> + 28.000|08> + 28.000|e8> + 27.000|56> + 26.000|a0> + 26.000|ed> + 25.000|97> + 24.000|54> + 23.000|93> + 23.000|a5> + 22.000|75> + 22.000|4f> + 22.000|66> + 22.000|42> + 21.000|7a> + 21.000|4e> + 21.000|b7> + 20.000|ca> + 20.000|ac> + 20.000|de> + 20.000|6f> + 19.000|c8> + 19.000|84> + 19.000|49> + 19.000|fa> + 18.000|13> + 18.000|81> + 18.000|ef> + 17.000|0f> + 17.000|ad> + 17.000|48> + 17.000|31> + 17.000|f2> + 17.000|b6> + 16.000|1b> + 16.000|4c> + 16.000|3c> + 15.000|be> + 14.000|95> + 14.000|53> + 13.000|12> + 13.000|10> + 13.000|62> + 13.000|9e> + 13.000|f3> + 12.000|9a> + 11.000|d0> + 11.000|60> + 11.000|46> + 11.000|59> + 11.000|34> + 11.000|39> + 11.000|3f> + 11.000|26> + 10.000|16> + 10.000|1f> + 10.000|02> + 10.000|da> + 10.000|8c> + 10.000|4a> + 10.000|63> + 10.000|2a> + 10.000|ec> + 9.000|c5> + 9.000|dc> + 9.000|55> + 9.000|50> + 9.000|30> + 9.000|33> + 9.000|ea> + 9.000|e9> + 9.000|e6> + 9.000|e2> + 8.000|0d> + 8.000|d1> + 8.000|d2> + 8.000|9b> + 8.000|92> + 8.000|bc> + 8.000|b1> + 8.000|7e> + 8.000|a9> + 8.000|70> + 8.000|71> + 8.000|73> + 8.000|8e> + 8.000|bb> + 8.000|5d> + 8.000|57> + 8.000|6e> + 8.000|3e> + 8.000|e4> + 8.000|e0> + 7.000|17> + 7.000|1d> + 7.000|1e> + 7.000|85> + 7.000|b4> + 7.000|b0> + 7.000|8b> + 7.000|69> + 7.000|41> + 7.000|77> + 7.000|21> + 7.000|fe> + 7.000|e1> + 6.000|d6> + 6.000|d8> + 6.000|cb> + 6.000|06> + 6.000|01> + 6.000|03> + 6.000|c4> + 6.000|af> + 6.000|99> + 6.000|db> + 6.000|dd> + 6.000|b2> + 6.000|7c> + 6.000|a8> + 6.000|5c> + 6.000|47> + 6.000|2b> + 6.000|a6> + 6.000|32> + 6.000|eb> + 6.000|f8> + 6.000|f4> + 6.000|3d> + 6.000|86> + 6.000|82> + 5.000|0e> + 5.000|0c> + 5.000|18> + 5.000|d3> + 5.000|cc> + 5.000|1c> + 5.000|04> + 5.000|c6> + 5.000|8f> + 5.000|88> + 5.000|7f> + 5.000|78> + 5.000|74> + 5.000|ce> + 5.000|61> + 5.000|64> + 5.000|40> + 5.000|45> + 5.000|44> + 5.000|52> + 5.000|6c> + 5.000|6a> + 5.000|6d> + 5.000|2e> + 5.000|25> + 5.000|29> + 5.000|fc> + 5.000|fd> + 5.000|e5> + 4.000|0a> + 4.000|0b> + 4.000|19> + 4.000|d5> + 4.000|d7> + 4.000|9f> + 4.000|00> + 4.000|c2> + 4.000|f0> + 4.000|7d> + 4.000|90> + 4.000|72> + 4.000|65> + 4.000|5f> + 4.000|5e> + 4.000|5a> + 4.000|9d> + 4.000|58> + 4.000|2f> + 4.000|a7> + 4.000|36> + 4.000|37> + 4.000|3a> + 4.000|23> + 4.000|ff> + 3.000|80> + 3.000|15> + 3.000|cf> + 3.000|1a> + 3.000|05> + 3.000|c9> + 3.000|87> + 3.000|89> + 3.000|b8> + 3.000|ab> + 3.000|7b> + 3.000|ae> + 3.000|c1> + 3.000|a4> + 3.000|8d> + 3.000|8a> + 3.000|c3> + 3.000|a1> + 3.000|cd> + 3.000|68> + 3.000|43> + 3.000|a2> + 3.000|6b> + 3.000|38> + 3.000|f1> + 3.000|f7> + 3.000|3b> + 3.000|b3> + 3.000|22> + 3.000|28> + 3.000|e7> + 2.000|9c> + 2.000|ba> + 2.000|07> + 2.000|c0> + 2.000|98> + 2.000|df> + 2.000|b9> + 2.000|67> + 2.000|76> + 2.000|35> + 2.000|bf> + 2.000|f9> + 2.000|f6> + 2.000|27> + 2.000|20> + 2.000|fb> + |11> + |c7> + |4b> + |b5> + |24> + |51> + |a3> + |ee> + |f5> + |e3>

Generated using this python.
In particular, these key pieces:

file_table = {
  "eztv-1"        : "web-pages/eztv-1.html",
  "eztv-2"        : "web-pages/eztv-2.html",
  "diary-1"       : "web-pages/k5-diary-1.html",
  "diary-2"       : "web-pages/k5-diary-2.html",
  "wc-comments-1" : "web-pages/wc-comments-1.html",
  "wc-comments-2" : "web-pages/wc-comments-2.html",
  "slashdot-1"    : "web-pages/slashdot-1.html",
  "slashdot-2"    : "web-pages/slashdot-2.html",
  "semantic-1"    : "web-pages/semantic-db-1.html",
}

def fragment_string(s,fragments):
  r = [s]
  for frag in fragments:
    list = r
    r = []
    for s in list:
      r += s.split(frag)
  return r

fragments = ["<","|",">"]

def dict_to_sp(dict):
  result = superposition()
  for x in dict:
    result.data.append(ket(x,dict[x]))
  return result

def dict_load_fragment_lengths(filename,fragments):
  dict = {}
  with open(filename,'r') as f:
    text = f.read()
    for sequence in fragment_string(text,fragments):
      length = str(len(sequence.strip()))
      if length not in dict:
        dict[length] = 1
      else:
        dict[length] += 1
  return dict_to_sp(dict)


import hashlib

# in testing so far, this thing works great! Much more discriminating power (by 10 points roughly) than frag_lengths.
# where "discriminating power" is the difference in coeff between the largest coeff in the sp, and the second largest.
# discrimination() is currently both in the functions and the superposition class.
def dict_load_fragment_hash(filename,fragments):
  dict = {}
  with open(filename,'r') as f:
    text = f.read()
    for sequence in fragment_string(text,fragments):
      hash = hashlib.sha1(sequence.strip().encode('utf-8')).hexdigest()[-2:]
      if hash not in dict:
        dict[hash] = 1
      else:
        dict[hash] += 1
  return dict_to_sp(dict)

for topic in file_table:
  file = file_table[topic]
  print("topic: " + topic)
  print("file:  " + file)
  x = topic
  C.learn("fragment-lengths",x,dict_load_fragment_lengths(file,fragments).coeff_sort())
  C.learn("fragment-hash",x,dict_load_fragment_hash(file,fragments).coeff_sort())

# insert these rules into context:
# simm |*> #=> 100 similar[fragment-lengths] |_self>
# hs |*> #=> 100 similar[fragment-hash] |_self>

C.learn("simm","*",stored_rule("100 similar[fragment-lengths] |_self>"))
C.learn("hs","*",stored_rule("100 similar[fragment-hash] |_self>"))

name = "web-pages/fragment-documents-4.sw"
save_sw(C,name)

Let's generate some results. Get the console to do it on load time, by adding this to the end of the fragment-documents-3.sw file.

|simm-result-1> => simm |eztv-1>
|simm-result-2> => simm |eztv-2>
|simm-result-3> => simm |slashdot-1>
|simm-result-4> => simm |slashdot-2>
|simm-result-5> => simm |diary-1>
|simm-result-6> => simm |diary-2>
|simm-result-7> => simm |wc-comments-1>
|simm-result-8> => simm |wc-comments-2>
|simm-result-9> => simm |semantic-1>

|hs-result-1> => hs |eztv-1>
|hs-result-2> => hs |eztv-2>
|hs-result-3> => hs |slashdot-1>
|hs-result-4> => hs |slashdot-2>
|hs-result-5> => hs |diary-1>
|hs-result-6> => hs |diary-2>
|hs-result-7> => hs |wc-comments-1>
|hs-result-8> => hs |wc-comments-2>
|hs-result-9> => hs |semantic-1>

Done. Now we have (after tidying up the dump in the console):

 |simm-result-1> => 97.025|eztv-2> + 65.993|slashdot-2> + 65.819|slashdot-1> + 63.885|diary-2> + 63.594|diary-1> + 58.907|wc-comments-1> + 58.907|wc-comments-2> + 41.707|semantic-1>
 |simm-result-2> => 97.025|eztv-1> + 67.337|slashdot-2> + 67.180|slashdot-1> + 65.823|diary-2> + 65.564|diary-1> + 59.827|wc-comments-2> + 59.827|wc-comments-1> + 43.032|semantic-1>
 |simm-result-3> => 98.915|slashdot-2> + 76.447|diary-2> + 76.194|diary-1> + 68.010|wc-comments-1> + 67.962|wc-comments-2> + 67.180|eztv-2> + 65.819|eztv-1> + 56.437|semantic-1>
 |simm-result-4> => 98.915|slashdot-1> + 76.371|diary-2> + 76.107|diary-1> + 67.939|wc-comments-1> + 67.892|wc-comments-2> + 67.337|eztv-2> + 65.993|eztv-1> + 56.353|semantic-1>
 |simm-result-5> => 98.525|diary-2> + 76.194|slashdot-1> + 76.107|slashdot-2> + 75.691|wc-comments-1> + 75.653|wc-comments-2> + 65.564|eztv-2> + 63.594|eztv-1> + 55.682|semantic-1>
 |simm-result-6> => 98.525|diary-1> + 76.447|slashdot-1> + 76.371|slashdot-2> + 76.215|wc-comments-1> + 76.201|wc-comments-2> + 65.823|eztv-2> + 63.885|eztv-1> + 56.463|semantic-1>
 |simm-result-7> => 99.811|wc-comments-2> + 76.215|diary-2> + 75.691|diary-1> + 68.010|slashdot-1> + 67.939|slashdot-2> + 65.518|semantic-1> + 59.827|eztv-2> + 58.907|eztv-1>
 |simm-result-8> => 99.811|wc-comments-1> + 76.201|diary-2> + 75.653|diary-1> + 67.962|slashdot-1> + 67.892|slashdot-2> + 65.492|semantic-1> + 59.827|eztv-2> + 58.907|eztv-1>
 |simm-result-9> => 65.518|wc-comments-1> + 65.492|wc-comments-2> + 56.463|diary-2> + 56.437|slashdot-1> + 56.353|slashdot-2> + 55.682|diary-1> + 43.032|eztv-2> + 41.707|eztv-1>

 |hs-result-1> => 96.824|eztv-2> + 69.881|slashdot-1> + 69.730|slashdot-2> + 65.903|diary-2> + 65.323|diary-1> + 60.620|wc-comments-1> + 60.603|wc-comments-2> + 46.397|semantic-1>
 |hs-result-2> => 96.824|eztv-1> + 69.748|slashdot-1> + 69.612|slashdot-2> + 66.569|diary-2> + 65.970|diary-1> + 61.013|wc-comments-1> + 60.982|wc-comments-2> + 45.875|semantic-1>
 |hs-result-3> => 98.205|slashdot-2> + 69.881|eztv-1> + 69.748|eztv-2> + 67.442|diary-2> + 66.801|diary-1> + 58.669|wc-comments-2> + 58.612|wc-comments-1> + 44.965|semantic-1>
 |hs-result-4> => 98.205|slashdot-1> + 69.730|eztv-1> + 69.612|eztv-2> + 67.063|diary-2> + 66.447|diary-1> + 58.519|wc-comments-2> + 58.462|wc-comments-1> + 44.890|semantic-1>
 |hs-result-5> => 97.902|diary-2> + 68.407|wc-comments-2> + 68.298|wc-comments-1> + 66.801|slashdot-1> + 66.447|slashdot-2> + 65.970|eztv-2> + 65.323|eztv-1> + 41.417|semantic-1>
 |hs-result-6> => 97.902|diary-1> + 68.762|wc-comments-2> + 68.625|wc-comments-1> + 67.442|slashdot-1> + 67.063|slashdot-2> + 66.569|eztv-2> + 65.903|eztv-1> + 42.049|semantic-1>
 |hs-result-7> => 99.669|wc-comments-2> + 68.625|diary-2> + 68.298|diary-1> + 61.013|eztv-2> + 60.620|eztv-1> + 58.612|slashdot-1> + 58.462|slashdot-2> + 43.165|semantic-1>
 |hs-result-8> => 99.669|wc-comments-1> + 68.762|diary-2> + 68.407|diary-1> + 60.982|eztv-2> + 60.603|eztv-1> + 58.669|slashdot-1> + 58.519|slashdot-2> + 43.180|semantic-1>
 |hs-result-9> => 46.397|eztv-1> + 45.875|eztv-2> + 44.965|slashdot-1> + 44.890|slashdot-2> + 43.180|wc-comments-2> + 43.165|wc-comments-1> + 42.049|diary-2> + 41.417|diary-1>

7/5/2014: OK. Decided to graph the fragment hash's and they back up the simm results very nicely.

8/5/2014: OK. Decided to use 4096 buckets instead of 256.
The code:

def list_load_fragment_hash(filename,fragments):
  array = [0] * 4096                                                  -- NB: 256 changes to 4096 
  with open(filename,'r') as f:
    text = f.read()
    for sequence in fragment_string(text,fragments):
      hash = hashlib.sha1(sequence.encode('utf-8')).hexdigest()[-3:]  -- NB: [-2:] changed to [-3:]
      x = int(hash,16)
      array[x] += 1
  return array

Here are the results:

Wow! I am in love with these graphs. They strongly remind me of emission lines in QM.
Anyway, an observation:
increasing threshold after the simm increases how similar the objects need to be for a match.
increasing threshold before the simm decreases how similar the objects need to be for a match.
In code:

drop-below[0.8] simm(""|f>, ""|g>)                       -- more specific
vs:
simm(drop-below[2] "" |f>, drop-below[2] "" |g>)         -- less specific, more general, which will be useful.

OK. Decided to try 64k buckets:

OK. Took quite a while to process, but I have some new simm results.
We have this at the bottom of fragment-documents-big-hash-post-processing.sw:

hs |*> #=> 100 similar[fragment-hash-big] |_self>
-- some post processing:
|hs-result-1> => hs |big-eztv-1>
|hs-result-2> => hs |big-eztv-2>
|hs-result-3> => hs |big-slashdot-1>
|hs-result-4> => hs |big-slashdot-2>
|hs-result-5> => hs |big-slashdot-3>
|hs-result-6> => hs |big-diary-1>
|hs-result-7> => hs |big-diary-2>
|hs-result-8> => hs |big-wc-comments-1>
|hs-result-9> => hs |big-wc-comments-2>
|hs-result-10> => hs |big-semantic-1>
|hs-result-11> => hs |big-semantic-2>

Giving these results:

 |hs-result-1> => 94.222|big-eztv-2> + 28.930|big-semantic-2> + 27.309|big-semantic-1> + 25.386|big-slashdot-3> + 25.268|big-slashdot-1> + 24.881|big-slashdot-2> + 20.443|big-diary-2> + 19.792|big-diary-1> + 18.030|big-wc-comments-1> + 18.025|big-wc-comments-2>
 |hs-result-2> => 94.222|big-eztv-1> + 28.527|big-semantic-2> + 26.552|big-semantic-1> + 25.029|big-slashdot-3> + 24.902|big-slashdot-1> + 24.620|big-slashdot-2> + 21.521|big-diary-2> + 20.880|big-diary-1> + 18.256|big-wc-comments-2> + 18.240|big-wc-comments-1>
 |hs-result-3> => 96.367|big-slashdot-2> + 79.561|big-slashdot-3> + 25.268|big-eztv-1> + 24.902|big-eztv-2> + 20.522|big-semantic-2> + 20.063|big-semantic-1> + 16.364|big-diary-2> + 16.054|big-diary-1> + 12.592|big-wc-comments-1> + 12.558|big-wc-comments-2>
 |hs-result-4> => 96.367|big-slashdot-1> + 79.506|big-slashdot-3> + 24.881|big-eztv-1> + 24.620|big-eztv-2> + 20.252|big-semantic-2> + 19.877|big-semantic-1> + 16.310|big-diary-2> + 16.000|big-diary-1> + 12.660|big-wc-comments-1> + 12.593|big-wc-comments-2>
 |hs-result-5> => 79.561|big-slashdot-1> + 79.506|big-slashdot-2> + 25.386|big-eztv-1> + 25.029|big-eztv-2> + 21.065|big-semantic-2> + 20.537|big-semantic-1> + 16.763|big-diary-2> + 16.606|big-diary-1> + 13.020|big-wc-comments-1> + 12.987|big-wc-comments-2>
 |hs-result-6> => 96.013|big-diary-2> + 40.747|big-wc-comments-1> + 40.747|big-wc-comments-2> + 20.880|big-eztv-2> + 19.792|big-eztv-1> + 18.007|big-semantic-1> + 17.932|big-semantic-2> + 16.606|big-slashdot-3> + 16.054|big-slashdot-1> + 16.000|big-slashdot-2>
 |hs-result-7> => 96.013|big-diary-1> + 40.610|big-wc-comments-1> + 40.610|big-wc-comments-2> + 21.521|big-eztv-2> + 20.443|big-eztv-1> + 18.154|big-semantic-1> + 18.120|big-semantic-2> + 16.763|big-slashdot-3> + 16.364|big-slashdot-1> + 16.310|big-slashdot-2>
 |hs-result-8> => 99.533|big-wc-comments-2> + 40.747|big-diary-1> + 40.610|big-diary-2> + 18.240|big-eztv-2> + 18.030|big-eztv-1> + 13.799|big-semantic-2> + 13.771|big-semantic-1> + 13.020|big-slashdot-3> + 12.660|big-slashdot-2> + 12.592|big-slashdot-1>
 |hs-result-9> => 99.533|big-wc-comments-1> + 40.747|big-diary-1> + 40.610|big-diary-2> + 18.256|big-eztv-2> + 18.025|big-eztv-1> + 13.849|big-semantic-2> + 13.791|big-semantic-1> + 12.987|big-slashdot-3> + 12.593|big-slashdot-2> + 12.558|big-slashdot-1>
 |hs-result-10> => 88.817|big-semantic-2> + 27.309|big-eztv-1> + 26.552|big-eztv-2> + 20.537|big-slashdot-3> + 20.063|big-slashdot-1> + 19.877|big-slashdot-2> + 18.154|big-diary-2> + 18.007|big-diary-1> + 13.791|big-wc-comments-2> + 13.771|big-wc-comments-1>
 |hs-result-11> => 88.817|big-semantic-1> + 28.930|big-eztv-1> + 28.527|big-eztv-2> + 21.065|big-slashdot-3> + 20.522|big-slashdot-1> + 20.252|big-slashdot-2> + 18.120|big-diary-2> + 17.932|big-diary-1> + 13.849|big-wc-comments-2> + 13.799|big-wc-comments-1>

Wow. So swapping from 256 to 4096 buckets has really increased the discrimination of this!
eg:

|hs-result-1> => 96.824|eztv-2> + 69.881|slashdot-1> + 69.730|slashdot-2> + ...
vs:
|hs-result-1> => 94.222|big-eztv-2> + 28.930|big-semantic-2> + 27.309|big-semantic-1> + 25.386|big-slashdot-3> + ...

OK. Next, let's look at fragment counts for a second.
Start with rules like this (which we added at the bottom of fragment-documents-big-hash-more-post-processing.sw, so it is calculated on sw load in the console):

drop-2-hash-op |*> #=> drop-below[2] fragment-hash-big |_self>
drop-3-hash-op |*> #=> drop-below[3] fragment-hash-big |_self>
drop-4-hash-op |*> #=> drop-below[4] fragment-hash-big |_self>
drop-5-hash-op |*> #=> drop-below[5] fragment-hash-big |_self>
drop-6-hash-op |*> #=> drop-below[6] fragment-hash-big |_self>

drop-2-hash |big-eztv-1> => drop-2-hash-op |_self>
drop-2-hash |big-eztv-2> => drop-2-hash-op |_self>
...
drop-3-hash |big-eztv-1> => drop-3-hash-op |_self>
...

count-1 |big-eztv-1> => count fragment-hash-big |_self>
...
count-2 |big-eztv-1> => count drop-2-hash |_self>
...
count-3 |big-eztv-1> => count drop-3-hash |_self>
...

And then we have this:

count-1 |big-eztv-1> => |number: 2113>                  -- number of distinct kets when using 4096 buckets
count-2 |big-eztv-1> => |number: 700>                   -- number of distinct kets with coeff 2 and above
count-3 |big-eztv-1> => |number: 202>                   -- number of distinct kets with coeff 3 and above
count-4 |big-eztv-1> => |number: 73>                    -- number of distinct kets with coeff 4 and above
count-5 |big-eztv-1> => |number: 34>                    -- number of distinct kets with coeff 5 and above
count-6 |big-eztv-1> => |number: 28>                    -- number of distinct kets with coeff 6 and above

count-1 |big-eztv-2> => |number: 2120>
count-2 |big-eztv-2> => |number: 740>
count-3 |big-eztv-2> => |number: 208>
count-4 |big-eztv-2> => |number: 81>
count-5 |big-eztv-2> => |number: 35>
count-6 |big-eztv-2> => |number: 31>

count-1 |big-slashdot-1> => |number: 1023>
count-2 |big-slashdot-1> => |number: 268>
count-3 |big-slashdot-1> => |number: 114>
count-4 |big-slashdot-1> => |number: 82>
count-5 |big-slashdot-1> => |number: 73>
count-6 |big-slashdot-1> => |number: 66>

count-1 |big-slashdot-2> => |number: 1020>
count-2 |big-slashdot-2> => |number: 267>
count-3 |big-slashdot-2> => |number: 113>
count-4 |big-slashdot-2> => |number: 83>
count-5 |big-slashdot-2> => |number: 74>
count-6 |big-slashdot-2> => |number: 68>

count-1 |big-slashdot-3> => |number: 1044>
count-2 |big-slashdot-3> => |number: 261>
count-3 |big-slashdot-3> => |number: 129>
count-4 |big-slashdot-3> => |number: 95>
count-5 |big-slashdot-3> => |number: 83>
count-6 |big-slashdot-3> => |number: 73>

count-1 |big-diary-1> => |number: 596>
count-2 |big-diary-1> => |number: 171>
count-3 |big-diary-1> => |number: 98>
count-4 |big-diary-1> => |number: 78>
count-5 |big-diary-1> => |number: 66>
count-6 |big-diary-1> => |number: 62>

count-1 |big-diary-2> => |number: 619>
count-2 |big-diary-2> => |number: 174>
count-3 |big-diary-2> => |number: 97>
count-4 |big-diary-2> => |number: 78>
count-5 |big-diary-2> => |number: 66>
count-6 |big-diary-2> => |number: 62>

count-1 |big-wc-comments-1> => |number: 493>
count-2 |big-wc-comments-1> => |number: 124>
count-3 |big-wc-comments-1> => |number: 58>
count-4 |big-wc-comments-1> => |number: 45>
count-5 |big-wc-comments-1> => |number: 39>
count-6 |big-wc-comments-1> => |number: 37>

count-1 |big-wc-comments-2> => |number: 494>
count-2 |big-wc-comments-2> => |number: 123>
count-3 |big-wc-comments-2> => |number: 58>
count-4 |big-wc-comments-2> => |number: 45>
count-5 |big-wc-comments-2> => |number: 39>
count-6 |big-wc-comments-2> => |number: 37>

count-1 |big-semantic-1> => |number: 1926>
count-2 |big-semantic-1> => |number: 730>
count-3 |big-semantic-1> => |number: 327>
count-4 |big-semantic-1> => |number: 193>
count-5 |big-semantic-1> => |number: 132>
count-6 |big-semantic-1> => |number: 91>

count-1 |big-semantic-2> => |number: 2266>
count-2 |big-semantic-2> => |number: 970>
count-3 |big-semantic-2> => |number: 412>
count-4 |big-semantic-2> => |number: 230>
count-5 |big-semantic-2> => |number: 150>
count-6 |big-semantic-2> => |number: 100>

Cool. After some processing of that .sw file, we have:

13/5/2014: BTW, this is a sample of the relevant code (in this case using 6):

drop-6-simm |*> #=> 100 similar[drop-6-hash] |_self>

drop-6-simm |big-eztv-1> => drop-6-simm |_self> 
drop-6-simm |big-eztv-2> => drop-6-simm |_self>
drop-6-simm |big-slashdot-1> => drop-6-simm |_self>
drop-6-simm |big-slashdot-2> => drop-6-simm |_self>
drop-6-simm |big-slashdot-3> => drop-6-simm |_self>
drop-6-simm |big-diary-1> => drop-6-simm |_self>
drop-6-simm |big-diary-2> => drop-6-simm |_self>
drop-6-simm |big-wc-comments-1> => drop-6-simm |_self>
drop-6-simm |big-wc-comments-2> => drop-6-simm |_self>
drop-6-simm |big-semantic-1> => drop-6-simm |_self>
drop-6-simm |big-semantic-2> => drop-6-simm |_self>

$ grep "^drop" sw-examples/fragment-documents-big-hash-more-post-processing--saved-2.sw  | grep "simm" | less

drop-2-simm |big-slashdot-2> => 96.888|big-slashdot-1> + 86.420|big-slashdot-3> + 19.525|big-eztv-1> + 18.822|big-eztv-2> + 13.456|big-diary-2> + 13.104|big-diary-1> + 12.465|big-semantic-1> + 11.684|big-semantic-2> + 10.829|big-wc-comments-2> + 10.815|big-wc-comments-1>
drop-3-simm |big-slashdot-2> => 99.005|big-slashdot-1> + 91.635|big-slashdot-3> + 21.095|big-eztv-1> + 21.020|big-eztv-2> + 13.811|big-diary-2> + 13.488|big-diary-1> + 12.379|big-semantic-1> + 11.458|big-semantic-2> + 10.735|big-wc-comments-1> + 10.735|big-wc-comments-2>
drop-4-simm |big-slashdot-2> => 99.177|big-slashdot-1> + 94.003|big-slashdot-3> + 21.831|big-eztv-1> + 21.619|big-eztv-2> + 14.148|big-diary-2> + 13.835|big-diary-1> + 13.529|big-semantic-1> + 12.731|big-semantic-2> + 11.193|big-wc-comments-1> + 11.193|big-wc-comments-2>
drop-5-simm |big-slashdot-2> => 99.608|big-slashdot-1> + 95.326|big-slashdot-3> + 21.891|big-eztv-1> + 21.720|big-eztv-2> + 14.257|big-semantic-1> + 14.047|big-diary-2> + 13.722|big-diary-1> + 13.650|big-semantic-2> + 11.466|big-wc-comments-1> + 11.466|big-wc-comments-2>
drop-6-simm |big-slashdot-2> => 99.315|big-slashdot-1> + 95.522|big-slashdot-3> + 22.063|big-eztv-1> + 21.905|big-eztv-2> + 15.225|big-semantic-1> + 14.513|big-semantic-2> + 14.234|big-diary-2> + 13.907|big-diary-1> + 11.607|big-wc-comments-1> + 11.607|big-wc-comments-2>

drop-2-simm |big-wc-comments-1> => 99.737|big-wc-comments-2> + 41.842|big-diary-1> + 41.707|big-diary-2> + 17.071|big-eztv-2> + 16.528|big-eztv-1> + 11.578|big-slashdot-3> + 10.815|big-slashdot-2> + 10.815|big-slashdot-1> + 10.449|big-semantic-1> + 9.549|big-semantic-2>
drop-3-simm |big-wc-comments-1> => 99.900|big-wc-comments-2> + 41.927|big-diary-1> + 41.879|big-diary-2> + 17.916|big-eztv-2> + 17.167|big-eztv-1> + 11.251|big-slashdot-3> + 10.735|big-slashdot-2> + 10.735|big-slashdot-1> + 10.247|big-semantic-2> + 10.217|big-semantic-1>
drop-4-simm |big-wc-comments-1> => 99.896|big-wc-comments-2> + 41.229|big-diary-1> + 41.155|big-diary-2> + 18.062|big-eztv-2> + 17.341|big-eztv-1> + 11.431|big-slashdot-3> + 11.201|big-slashdot-1> + 11.193|big-slashdot-2> + 10.387|big-semantic-1> + 10.327|big-semantic-2>
drop-5-simm |big-wc-comments-1> => 99.893|big-wc-comments-2> + 41.620|big-diary-1> + 41.548|big-diary-2> + 18.639|big-eztv-2> + 17.712|big-eztv-1> + 11.726|big-slashdot-3> + 11.475|big-slashdot-1> + 11.466|big-slashdot-2> + 10.445|big-semantic-2> + 10.281|big-semantic-1>
drop-6-simm |big-wc-comments-1> => 99.892|big-wc-comments-2> + 42.094|big-diary-1> + 42.017|big-diary-2> + 18.828|big-eztv-2> + 17.905|big-eztv-1> + 11.623|big-slashdot-1> + 11.622|big-slashdot-3> + 11.607|big-slashdot-2> + 10.709|big-semantic-2> + 10.509|big-semantic-1>

drop-2-simm |big-slashdot-3> => 86.420|big-slashdot-2> + 86.332|big-slashdot-1> + 19.757|big-eztv-1> + 19.140|big-eztv-2> + 14.599|big-diary-2> + 14.146|big-diary-1> + 13.144|big-semantic-1> + 12.428|big-semantic-2> + 11.592|big-wc-comments-2> + 11.578|big-wc-comments-1>
drop-3-simm |big-slashdot-3> => 91.635|big-slashdot-2> + 91.483|big-slashdot-1> + 21.401|big-eztv-1> + 21.387|big-eztv-2> + 14.822|big-diary-2> + 14.275|big-diary-1> + 12.588|big-semantic-1> + 11.946|big-semantic-2> + 11.251|big-wc-comments-1> + 11.251|big-wc-comments-2>
drop-4-simm |big-slashdot-3> => 94.027|big-slashdot-1> + 94.003|big-slashdot-2> + 22.434|big-eztv-1> + 22.263|big-eztv-2> + 15.225|big-diary-2> + 14.681|big-diary-1> + 13.727|big-semantic-1> + 13.053|big-semantic-2> + 11.431|big-wc-comments-1> + 11.431|big-wc-comments-2>
drop-5-simm |big-slashdot-3> => 95.350|big-slashdot-1> + 95.326|big-slashdot-2> + 22.655|big-eztv-1> + 22.500|big-eztv-2> + 14.936|big-diary-2> + 14.611|big-diary-1> + 14.315|big-semantic-1> + 13.708|big-semantic-2> + 11.726|big-wc-comments-1> + 11.726|big-wc-comments-2>
drop-6-simm |big-slashdot-3> => 95.628|big-slashdot-1> + 95.522|big-slashdot-2> + 23.057|big-eztv-1> + 22.928|big-eztv-2> + 15.306|big-semantic-1> + 15.187|big-diary-2> + 14.861|big-diary-1> + 14.594|big-semantic-2> + 11.622|big-wc-comments-1> + 11.622|big-wc-comments-2>

drop-2-simm |big-wc-comments-2> => 99.737|big-wc-comments-1> + 41.883|big-diary-1> + 41.748|big-diary-2> + 17.051|big-eztv-2> + 16.504|big-eztv-1> + 11.592|big-slashdot-3> + 10.829|big-slashdot-2> + 10.829|big-slashdot-1> + 10.463|big-semantic-1> + 9.551|big-semantic-2>
drop-3-simm |big-wc-comments-2> => 99.900|big-wc-comments-1> + 41.927|big-diary-1> + 41.879|big-diary-2> + 17.916|big-eztv-2> + 17.167|big-eztv-1> + 11.251|big-slashdot-3> + 10.735|big-slashdot-2> + 10.735|big-slashdot-1> + 10.247|big-semantic-2> + 10.217|big-semantic-1>
drop-4-simm |big-wc-comments-2> => 99.896|big-wc-comments-1> + 41.229|big-diary-1> + 41.155|big-diary-2> + 18.062|big-eztv-2> + 17.341|big-eztv-1> + 11.431|big-slashdot-3> + 11.201|big-slashdot-1> + 11.193|big-slashdot-2> + 10.387|big-semantic-1> + 10.327|big-semantic-2>
drop-5-simm |big-wc-comments-2> => 99.893|big-wc-comments-1> + 41.620|big-diary-1> + 41.548|big-diary-2> + 18.639|big-eztv-2> + 17.712|big-eztv-1> + 11.726|big-slashdot-3> + 11.475|big-slashdot-1> + 11.466|big-slashdot-2> + 10.445|big-semantic-2> + 10.281|big-semantic-1>
drop-6-simm |big-wc-comments-2> => 99.892|big-wc-comments-1> + 42.094|big-diary-1> + 42.017|big-diary-2> + 18.828|big-eztv-2> + 17.905|big-eztv-1> + 11.623|big-slashdot-1> + 11.622|big-slashdot-3> + 11.607|big-slashdot-2> + 10.709|big-semantic-2> + 10.509|big-semantic-1>

drop-2-simm |big-eztv-2> => 91.919|big-eztv-1> + 19.140|big-slashdot-3> + 18.954|big-slashdot-1> + 18.822|big-slashdot-2> + 18.502|big-diary-2> + 18.168|big-diary-1> + 17.071|big-wc-comments-1> + 17.051|big-wc-comments-2> + 14.576|big-semantic-2> + 14.373|big-semantic-1>
drop-3-simm |big-eztv-2> => 92.067|big-eztv-1> + 21.387|big-slashdot-3> + 21.020|big-slashdot-2> + 20.968|big-slashdot-1> + 18.885|big-diary-2> + 18.682|big-diary-1> + 17.916|big-wc-comments-1> + 17.916|big-wc-comments-2> + 10.938|big-semantic-1> + 10.033|big-semantic-2>
drop-4-simm |big-eztv-2> => 91.371|big-eztv-1> + 22.263|big-slashdot-3> + 21.630|big-slashdot-1> + 21.619|big-slashdot-2> + 19.604|big-diary-2> + 19.334|big-diary-1> + 18.062|big-wc-comments-1> + 18.062|big-wc-comments-2> + 11.566|big-semantic-1> + 10.769|big-semantic-2>
drop-5-simm |big-eztv-2> => 91.874|big-eztv-1> + 22.500|big-slashdot-3> + 21.732|big-slashdot-1> + 21.720|big-slashdot-2> + 20.235|big-diary-2> + 19.963|big-diary-1> + 18.639|big-wc-comments-1> + 18.639|big-wc-comments-2> + 12.323|big-semantic-1> + 11.531|big-semantic-2>
drop-6-simm |big-eztv-2> => 91.821|big-eztv-1> + 22.928|big-slashdot-3> + 21.977|big-slashdot-1> + 21.905|big-slashdot-2> + 20.446|big-diary-2> + 20.173|big-diary-1> + 18.828|big-wc-comments-1> + 18.828|big-wc-comments-2> + 13.466|big-semantic-1> + 12.758|big-semantic-2>

drop-2-simm |big-slashdot-1> => 96.888|big-slashdot-2> + 86.332|big-slashdot-3> + 19.664|big-eztv-1> + 18.954|big-eztv-2> + 13.366|big-diary-2> + 13.014|big-diary-1> + 12.598|big-semantic-1> + 11.857|big-semantic-2> + 10.829|big-wc-comments-2> + 10.815|big-wc-comments-1>
drop-3-simm |big-slashdot-1> => 99.005|big-slashdot-2> + 91.483|big-slashdot-3> + 21.043|big-eztv-1> + 20.968|big-eztv-2> + 13.811|big-diary-2> + 13.488|big-diary-1> + 12.611|big-semantic-1> + 11.661|big-semantic-2> + 10.735|big-wc-comments-1> + 10.735|big-wc-comments-2>
drop-4-simm |big-slashdot-1> => 99.177|big-slashdot-2> + 94.027|big-slashdot-3> + 21.841|big-eztv-1> + 21.630|big-eztv-2> + 14.160|big-diary-2> + 13.847|big-diary-1> + 13.535|big-semantic-1> + 12.738|big-semantic-2> + 11.201|big-wc-comments-1> + 11.201|big-wc-comments-2>
drop-5-simm |big-slashdot-1> => 99.608|big-slashdot-2> + 95.350|big-slashdot-3> + 21.903|big-eztv-1> + 21.732|big-eztv-2> + 14.264|big-semantic-1> + 14.059|big-diary-2> + 13.734|big-diary-1> + 13.656|big-semantic-2> + 11.475|big-wc-comments-1> + 11.475|big-wc-comments-2>
drop-6-simm |big-slashdot-1> => 99.315|big-slashdot-2> + 95.628|big-slashdot-3> + 22.136|big-eztv-1> + 21.977|big-eztv-2> + 15.238|big-semantic-1> + 14.525|big-semantic-2> + 14.257|big-diary-2> + 13.930|big-diary-1> + 11.623|big-wc-comments-1> + 11.623|big-wc-comments-2>

drop-2-simm |big-eztv-1> => 91.919|big-eztv-2> + 19.757|big-slashdot-3> + 19.664|big-slashdot-1> + 19.525|big-slashdot-2> + 16.839|big-diary-2> + 16.543|big-diary-1> + 16.528|big-wc-comments-1> + 16.504|big-wc-comments-2> + 14.555|big-semantic-1> + 14.422|big-semantic-2>
drop-3-simm |big-eztv-1> => 92.067|big-eztv-2> + 21.401|big-slashdot-3> + 21.095|big-slashdot-2> + 21.043|big-slashdot-1> + 17.167|big-wc-comments-1> + 17.167|big-wc-comments-2> + 16.626|big-diary-2> + 16.527|big-diary-1> + 10.899|big-semantic-1> + 9.977|big-semantic-2>
drop-4-simm |big-eztv-1> => 91.371|big-eztv-2> + 22.434|big-slashdot-3> + 21.841|big-slashdot-1> + 21.831|big-slashdot-2> + 17.341|big-wc-comments-1> + 17.341|big-wc-comments-2> + 17.077|big-diary-2> + 16.807|big-diary-1> + 11.472|big-semantic-1> + 10.542|big-semantic-2>
drop-5-simm |big-eztv-1> => 91.874|big-eztv-2> + 22.655|big-slashdot-3> + 21.903|big-slashdot-1> + 21.891|big-slashdot-2> + 17.712|big-wc-comments-1> + 17.712|big-wc-comments-2> + 17.367|big-diary-2> + 17.095|big-diary-1> + 12.336|big-semantic-1> + 11.531|big-semantic-2>
drop-6-simm |big-eztv-1> => 91.821|big-eztv-2> + 23.057|big-slashdot-3> + 22.136|big-slashdot-1> + 22.063|big-slashdot-2> + 17.905|big-wc-comments-1> + 17.905|big-wc-comments-2> + 17.574|big-diary-2> + 17.301|big-diary-1> + 13.499|big-semantic-1> + 12.775|big-semantic-2>

drop-2-simm |big-semantic-1> => 87.282|big-semantic-2> + 14.972|big-diary-1> + 14.903|big-diary-2> + 14.555|big-eztv-1> + 14.373|big-eztv-2> + 13.144|big-slashdot-3> + 12.598|big-slashdot-1> + 12.465|big-slashdot-2> + 10.463|big-wc-comments-2> + 10.449|big-wc-comments-1>
drop-3-simm |big-semantic-1> => 91.060|big-semantic-2> + 16.607|big-diary-1> + 16.553|big-diary-2> + 12.611|big-slashdot-1> + 12.588|big-slashdot-3> + 12.379|big-slashdot-2> + 10.938|big-eztv-2> + 10.899|big-eztv-1> + 10.217|big-wc-comments-1> + 10.217|big-wc-comments-2>
drop-4-simm |big-semantic-1> => 93.378|big-semantic-2> + 17.871|big-diary-2> + 17.617|big-diary-1> + 13.727|big-slashdot-3> + 13.535|big-slashdot-1> + 13.529|big-slashdot-2> + 11.566|big-eztv-2> + 11.472|big-eztv-1> + 10.387|big-wc-comments-1> + 10.387|big-wc-comments-2>
drop-5-simm |big-semantic-1> => 95.103|big-semantic-2> + 17.879|big-diary-2> + 17.620|big-diary-1> + 14.315|big-slashdot-3> + 14.264|big-slashdot-1> + 14.257|big-slashdot-2> + 12.336|big-eztv-1> + 12.323|big-eztv-2> + 10.281|big-wc-comments-1> + 10.281|big-wc-comments-2>
drop-6-simm |big-semantic-1> => 96.188|big-semantic-2> + 18.303|big-diary-2> + 18.053|big-diary-1> + 15.306|big-slashdot-3> + 15.238|big-slashdot-1> + 15.225|big-slashdot-2> + 13.499|big-eztv-1> + 13.466|big-eztv-2> + 10.509|big-wc-comments-1> + 10.509|big-wc-comments-2>

drop-2-simm |big-diary-2> => 97.510|big-diary-1> + 41.748|big-wc-comments-2> + 41.707|big-wc-comments-1> + 18.502|big-eztv-2> + 16.839|big-eztv-1> + 14.903|big-semantic-1> + 14.599|big-slashdot-3> + 14.071|big-semantic-2> + 13.456|big-slashdot-2> + 13.366|big-slashdot-1>
drop-3-simm |big-diary-2> => 98.236|big-diary-1> + 41.879|big-wc-comments-1> + 41.879|big-wc-comments-2> + 18.885|big-eztv-2> + 16.626|big-eztv-1> + 16.553|big-semantic-1> + 15.550|big-semantic-2> + 14.822|big-slashdot-3> + 13.811|big-slashdot-2> + 13.811|big-slashdot-1>
drop-4-simm |big-diary-2> => 97.872|big-diary-1> + 41.155|big-wc-comments-1> + 41.155|big-wc-comments-2> + 19.604|big-eztv-2> + 17.871|big-semantic-1> + 17.380|big-semantic-2> + 17.077|big-eztv-1> + 15.225|big-slashdot-3> + 14.160|big-slashdot-1> + 14.148|big-slashdot-2>
drop-5-simm |big-diary-2> => 98.533|big-diary-1> + 41.548|big-wc-comments-1> + 41.548|big-wc-comments-2> + 20.235|big-eztv-2> + 17.911|big-semantic-2> + 17.879|big-semantic-1> + 17.367|big-eztv-1> + 14.936|big-slashdot-3> + 14.059|big-slashdot-1> + 14.047|big-slashdot-2>
drop-6-simm |big-diary-2> => 98.820|big-diary-1> + 42.017|big-wc-comments-1> + 42.017|big-wc-comments-2> + 20.446|big-eztv-2> + 18.423|big-semantic-2> + 18.303|big-semantic-1> + 17.574|big-eztv-1> + 15.187|big-slashdot-3> + 14.257|big-slashdot-1> + 14.234|big-slashdot-2>

drop-2-simm |big-diary-1> => 97.510|big-diary-2> + 41.883|big-wc-comments-2> + 41.842|big-wc-comments-1> + 18.168|big-eztv-2> + 16.543|big-eztv-1> + 14.972|big-semantic-1> + 14.146|big-slashdot-3> + 14.118|big-semantic-2> + 13.104|big-slashdot-2> + 13.014|big-slashdot-1>
drop-3-simm |big-diary-1> => 98.236|big-diary-2> + 41.927|big-wc-comments-1> + 41.927|big-wc-comments-2> + 18.682|big-eztv-2> + 16.607|big-semantic-1> + 16.527|big-eztv-1> + 15.604|big-semantic-2> + 14.275|big-slashdot-3> + 13.488|big-slashdot-2> + 13.488|big-slashdot-1>
drop-4-simm |big-diary-1> => 97.872|big-diary-2> + 41.229|big-wc-comments-1> + 41.229|big-wc-comments-2> + 19.334|big-eztv-2> + 17.617|big-semantic-1> + 17.409|big-semantic-2> + 16.807|big-eztv-1> + 14.681|big-slashdot-3> + 13.847|big-slashdot-1> + 13.835|big-slashdot-2>
drop-5-simm |big-diary-1> => 98.533|big-diary-2> + 41.620|big-wc-comments-1> + 41.620|big-wc-comments-2> + 19.963|big-eztv-2> + 17.652|big-semantic-2> + 17.620|big-semantic-1> + 17.095|big-eztv-1> + 14.611|big-slashdot-3> + 13.734|big-slashdot-1> + 13.722|big-slashdot-2>
drop-6-simm |big-diary-1> => 98.820|big-diary-2> + 42.094|big-wc-comments-1> + 42.094|big-wc-comments-2> + 20.173|big-eztv-2> + 18.166|big-semantic-2> + 18.053|big-semantic-1> + 17.301|big-eztv-1> + 14.861|big-slashdot-3> + 13.930|big-slashdot-1> + 13.907|big-slashdot-2>

drop-2-simm |big-semantic-2> => 87.282|big-semantic-1> + 14.576|big-eztv-2> + 14.422|big-eztv-1> + 14.118|big-diary-1> + 14.071|big-diary-2> + 12.428|big-slashdot-3> + 11.857|big-slashdot-1> + 11.684|big-slashdot-2> + 9.551|big-wc-comments-2> + 9.549|big-wc-comments-1>
drop-3-simm |big-semantic-2> => 91.060|big-semantic-1> + 15.604|big-diary-1> + 15.550|big-diary-2> + 11.946|big-slashdot-3> + 11.661|big-slashdot-1> + 11.458|big-slashdot-2> + 10.247|big-wc-comments-1> + 10.247|big-wc-comments-2> + 10.033|big-eztv-2> + 9.977|big-eztv-1>
drop-4-simm |big-semantic-2> => 93.378|big-semantic-1> + 17.409|big-diary-1> + 17.380|big-diary-2> + 13.053|big-slashdot-3> + 12.738|big-slashdot-1> + 12.731|big-slashdot-2> + 10.769|big-eztv-2> + 10.542|big-eztv-1> + 10.327|big-wc-comments-1> + 10.327|big-wc-comments-2>
drop-5-simm |big-semantic-2> => 95.103|big-semantic-1> + 17.911|big-diary-2> + 17.652|big-diary-1> + 13.708|big-slashdot-3> + 13.656|big-slashdot-1> + 13.650|big-slashdot-2> + 11.531|big-eztv-2> + 11.531|big-eztv-1> + 10.445|big-wc-comments-1> + 10.445|big-wc-comments-2>
drop-6-simm |big-semantic-2> => 96.188|big-semantic-1> + 18.423|big-diary-2> + 18.166|big-diary-1> + 14.594|big-slashdot-3> + 14.525|big-slashdot-1> + 14.513|big-slashdot-2> + 12.775|big-eztv-1> + 12.758|big-eztv-2> + 10.709|big-wc-comments-1> + 10.709|big-wc-comments-2>

Cool result. Discrimination now of 70 odd points! Compared to what, 20 was it last time?

9/5/2014: Just implemented this:

# common[op] (|x> + |y> + |z>)
# eg: common[friends] (|Fred> + |Sam>)
# eg: common[actors] (|movie-1> + |movie-2>)
# or indirectly
# |list> => |Fred> + |Sam> + |Charles> 
# common[friends] "" |list>                    -- this has the advantage that we can consider arbitrary long lists, without much hassle.
def common(one,context,op):
  if one.count() <= 1:                         # this should also neatly filter out kets, I presume.
    return one.apply_op(context,op)
  
  r = one.data[0].apply_op(context,op)
  for k in range(1,one.count()):
    sp = one.data[k].apply_op(context,op)
    r = intersection(r,sp)
  return r

The old way to do this was:

common(friends |Fred>, friends |Sam>)                           -- where "common" is an alias for "intersection"
common(actors |movie-1>, actors |movie-2>)
common(friends |Fred>, friends |Sam>, friends |Charles>)

So, small but useful improvement.
BTW, had to make some minor changes to the ket/sp .apply_sp_fn() so that it could take parameters.
And this line in the processor:

"common"             : ".apply_sp_fn(common,context,\"{0}\")",

16/5/2014 update: OK. Here are some uses. Way up above we have:

 If we have data on George, Ed and Travis's friends we can do:
"Which friends do George, Ed and Travis have in common?"
|answer> => intersection(friends|person: George>, friends|person: Ed>, friends|person: Travis>)

-- which BTW is a common pattern:
|answer> => intersection(op|U>, op|V>, op|X>, op|Y>)

"Which actors do movie name-a and movie name-b have in common?"
|answer> => intersection(actors|movie: name-a>,actors|movie: name-b>)

Now we can implement these using:

|answer> => common[friends] (|person: George> + |person: Ed> + |person: Travis>)
|answer> => common[op] (|U> + |V> + |X> + |Y>)
|answer> => common[actors] (|movie-a> + |movie-b>)

OK. Made some graphs of ebooks, this time using the last 5 digits of the hash, so a max of 1048576 buckets.
Thankfully each ebook tends to only have a few tens of thousands of unique hashes (most probably corresponding to unique words.)
Anyway, graphs here.
ket counts here:
BTW, using: fragments = [" ","\t",".","\n",","] (heh, fragments is really a terrible name for these cutting sequences, but too late now!)

count-1 |Tom-Sawyer-1M> => |number: 10772>              -- number of distinct kets when using 1048576 buckets
count-2 |Tom-Sawyer-1M> => |number: 4431>               -- number of distinct kets with coeff 2 and above
count-3 |Tom-Sawyer-1M> => |number: 2893>               -- number of distinct kets with coeff 3 and above
count-4 |Tom-Sawyer-1M> => |number: 2149>               -- number of distinct kets with coeff 4 and above
count-5 |Tom-Sawyer-1M> => |number: 1692>               -- number of distinct kets with coeff 5 and above
count-6 |Tom-Sawyer-1M> => |number: 1417>               -- number of distinct kets with coeff 6 and above
count-7 |Tom-Sawyer-1M> => |number: 1223>               -- number of distinct kets with coeff 7 and above
count-8 |Tom-Sawyer-1M> => |number: 1044>               -- number of distinct kets with coeff 8 and above
count-9 |Tom-Sawyer-1M> => |number: 908>                -- number of distinct kets with coeff 9 and above
count-10 |Tom-Sawyer-1M> => |number: 818>               -- number of distinct kets with coeff 10 and above

count-1 |Gone-with-Wind-1M> => |number: 24270>          -- note again, the sparse representation pays off. Otherwise, these would all have 1 million terms!
count-2 |Gone-with-Wind-1M> => |number: 12666>
count-3 |Gone-with-Wind-1M> => |number: 9152>
count-4 |Gone-with-Wind-1M> => |number: 7294>
count-5 |Gone-with-Wind-1M> => |number: 6182>
count-6 |Gone-with-Wind-1M> => |number: 5307>
count-7 |Gone-with-Wind-1M> => |number: 4674>
count-8 |Gone-with-Wind-1M> => |number: 4195>
count-9 |Gone-with-Wind-1M> => |number: 3819>
count-10 |Gone-with-Wind-1M> => |number: 3525>

count-1 |Frankenstein-1M> => |number: 9160>
count-2 |Frankenstein-1M> => |number: 4582>
count-3 |Frankenstein-1M> => |number: 3184>
count-4 |Frankenstein-1M> => |number: 2432>
count-5 |Frankenstein-1M> => |number: 1945>
count-6 |Frankenstein-1M> => |number: 1612>
count-7 |Frankenstein-1M> => |number: 1394>
count-8 |Frankenstein-1M> => |number: 1215>
count-9 |Frankenstein-1M> => |number: 1077>
count-10 |Frankenstein-1M> => |number: 968>

count-1 |Alice-in-Wonderland-1M> => |number: 4744>
count-2 |Alice-in-Wonderland-1M> => |number: 2035>
count-3 |Alice-in-Wonderland-1M> => |number: 1360>
count-4 |Alice-in-Wonderland-1M> => |number: 1024>
count-5 |Alice-in-Wonderland-1M> => |number: 827>
count-6 |Alice-in-Wonderland-1M> => |number: 696>
count-7 |Alice-in-Wonderland-1M> => |number: 611>
count-8 |Alice-in-Wonderland-1M> => |number: 525>
count-9 |Alice-in-Wonderland-1M> => |number: 458>
count-10 |Alice-in-Wonderland-1M> => |number: 414>

count-1 |Shakespeare-1M> => |number: 72218>
count-2 |Shakespeare-1M> => |number: 30731>
count-3 |Shakespeare-1M> => |number: 20170>
count-4 |Shakespeare-1M> => |number: 15475>
count-5 |Shakespeare-1M> => |number: 12679>
count-6 |Shakespeare-1M> => |number: 10781>
count-7 |Shakespeare-1M> => |number: 9392>
count-8 |Shakespeare-1M> => |number: 8412>
count-9 |Shakespeare-1M> => |number: 7571>
count-10 |Shakespeare-1M> => |number: 6909>

count-1 |Moby-Dick-1M> => |number: 26238>
count-2 |Moby-Dick-1M> => |number: 11258>
count-3 |Moby-Dick-1M> => |number: 7372>
count-4 |Moby-Dick-1M> => |number: 5520>
count-5 |Moby-Dick-1M> => |number: 4424>
count-6 |Moby-Dick-1M> => |number: 3671>
count-7 |Moby-Dick-1M> => |number: 3137>
count-8 |Moby-Dick-1M> => |number: 2746>
count-9 |Moby-Dick-1M> => |number: 2415>
count-10 |Moby-Dick-1M> => |number: 2157>

count-1 |I-Robot-1M> => |number: 9218>
count-2 |I-Robot-1M> => |number: 4168>
count-3 |I-Robot-1M> => |number: 2803>
count-4 |I-Robot-1M> => |number: 2098>
count-5 |I-Robot-1M> => |number: 1711>
count-6 |I-Robot-1M> => |number: 1424>
count-7 |I-Robot-1M> => |number: 1234>
count-8 |I-Robot-1M> => |number: 1100>
count-9 |I-Robot-1M> => |number: 985>
count-10 |I-Robot-1M> => |number: 893>

count-1 |Sherlock-Holmes-1M> => |number: 10684>
count-2 |Sherlock-Holmes-1M> => |number: 5126>
count-3 |Sherlock-Holmes-1M> => |number: 3549>
count-4 |Sherlock-Holmes-1M> => |number: 2729>
count-5 |Sherlock-Holmes-1M> => |number: 2254>
count-6 |Sherlock-Holmes-1M> => |number: 1941>
count-7 |Sherlock-Holmes-1M> => |number: 1672>
count-8 |Sherlock-Holmes-1M> => |number: 1467>
count-9 |Sherlock-Holmes-1M> => |number: 1330>
count-10 |Sherlock-Holmes-1M> => |number: 1183>

count-1 |nineteen-eighty-four-1M> => |number: 11454>
count-2 |nineteen-eighty-four-1M> => |number: 5444>
count-3 |nineteen-eighty-four-1M> => |number: 3700>
count-4 |nineteen-eighty-four-1M> => |number: 2831>
count-5 |nineteen-eighty-four-1M> => |number: 2289>
count-6 |nineteen-eighty-four-1M> => |number: 1914>
count-7 |nineteen-eighty-four-1M> => |number: 1633>
count-8 |nineteen-eighty-four-1M> => |number: 1445>
count-9 |nineteen-eighty-four-1M> => |number: 1272>
count-10 |nineteen-eighty-four-1M> => |number: 1161>

And now the simm results:

$ grep "^drop" sw-examples/frag_ebooks_1M_post_processing--saved.sw | grep "simm"
drop-10-simm |nineteen-eighty-four-1M> => 69.955|Sherlock-Holmes-1M> + 69.483|Tom-Sawyer-1M> + 69.442|I-Robot-1M> + 68.444|Moby-Dick-1M> + 64.511|Gone-with-Wind-1M> + 64.251|Frankenstein-1M> + 62.139|Alice-in-Wonderland-1M> + 49.982|Shakespeare-1M>
drop-10-simm |Moby-Dick-1M> => 68.730|Sherlock-Holmes-1M> + 68.444|nineteen-eighty-four-1M> + 66.421|Tom-Sawyer-1M> + 64.975|Frankenstein-1M> + 64.652|I-Robot-1M> + 62.815|Gone-with-Wind-1M> + 60.736|Alice-in-Wonderland-1M> + 56.115|Shakespeare-1M>
drop-10-simm |Shakespeare-1M> => 57.121|Sherlock-Holmes-1M> + 56.115|Moby-Dick-1M> + 54.958|Gone-with-Wind-1M> + 54.597|Frankenstein-1M> + 53.924|I-Robot-1M> + 50.770|Tom-Sawyer-1M> + 49.982|nineteen-eighty-four-1M> + 45.706|Alice-in-Wonderland-1M>
drop-10-simm |Tom-Sawyer-1M> => 70.489|Sherlock-Holmes-1M> + 69.483|nineteen-eighty-four-1M> + 68.727|I-Robot-1M> + 67.818|Gone-with-Wind-1M> + 66.947|Alice-in-Wonderland-1M> + 66.421|Moby-Dick-1M> + 63.893|Frankenstein-1M> + 50.770|Shakespeare-1M>
drop-10-simm |Sherlock-Holmes-1M> => 70.581|I-Robot-1M> + 70.489|Tom-Sawyer-1M> + 69.955|nineteen-eighty-four-1M> + 68.730|Moby-Dick-1M> + 68.671|Frankenstein-1M> + 65.807|Gone-with-Wind-1M> + 62.923|Alice-in-Wonderland-1M> + 57.121|Shakespeare-1M>
drop-10-simm |Frankenstein-1M> => 68.671|Sherlock-Holmes-1M> + 64.975|Moby-Dick-1M> + 64.251|nineteen-eighty-four-1M> + 63.893|Tom-Sawyer-1M> + 60.598|I-Robot-1M> + 58.861|Gone-with-Wind-1M> + 57.375|Alice-in-Wonderland-1M> + 54.597|Shakespeare-1M>
drop-10-simm |Gone-with-Wind-1M> => 67.818|Tom-Sawyer-1M> + 65.807|Sherlock-Holmes-1M> + 64.511|nineteen-eighty-four-1M> + 63.202|I-Robot-1M> + 62.815|Moby-Dick-1M> + 58.861|Frankenstein-1M> + 58.766|Alice-in-Wonderland-1M> + 54.958|Shakespeare-1M>
drop-10-simm |Alice-in-Wonderland-1M> => 66.947|Tom-Sawyer-1M> + 62.923|Sherlock-Holmes-1M> + 62.561|I-Robot-1M> + 62.139|nineteen-eighty-four-1M> + 60.736|Moby-Dick-1M> + 58.766|Gone-with-Wind-1M> + 57.375|Frankenstein-1M> + 45.706|Shakespeare-1M>
drop-10-simm |I-Robot-1M> => 70.581|Sherlock-Holmes-1M> + 69.442|nineteen-eighty-four-1M> + 68.727|Tom-Sawyer-1M> + 64.652|Moby-Dick-1M> + 63.202|Gone-with-Wind-1M> + 62.561|Alice-in-Wonderland-1M> + 60.598|Frankenstein-1M> + 53.924|Shakespeare-1M>

So these ebooks are, using this method, roughly 60-70% similar, except for Shakespeare who is around 55% similar to the rest.
BTW, "using this method". I mean, another one might be to take 2-grams, or n-grams even, and hash those, then compare.

12/5/2014: OK. Way up above we defined a grid using BKO. Well, just want to show we can define a tree (eg binary) using BKO.

16/5/2014 update: OK. Here is the define-a-grid code (cf, I hope, grid cell neurons in rat brains):

def ket_elt(j,i):
  return ket("grid: " + str(j) + " " + str(i))

def ket_elt_bd(j,i,I,J):
# finite universe model:
#  if i <= 0 or j <= 0 or i > I or j > J:
#    return ket("",0)                                       -- NB: this makes use of the fact that if the learn rule is |> 
# torus model:                                              -- then it is ignored. ie, it is not learnt.
  i = (i - 1)%I + 1                                         -- eg: foo |x> => |> 
  j = (j - 1)%J + 1                                         -- leaves |x> unchanged.
  return ket("grid: " + str(j) + " " + str(i))

def create_grid(c,I,J):
  c.learn("dim-1","grid",str(I))
  c.learn("dim-2","grid",str(J))

  for j in range(1,J+1):
    for i in range(1,I+1):
      elt = ket_elt(j,i)
      c.add_learn("elements","grid",elt)
      c.learn("N",elt,ket_elt_bd(j-1,i,I,J))
      c.learn("NE",elt,ket_elt_bd(j-1,i+1,I,J))
      c.learn("E",elt,ket_elt_bd(j,i+1,I,J))
      c.learn("SE",elt,ket_elt_bd(j+1,i+1,I,J))
      c.learn("S",elt,ket_elt_bd(j+1,i,I,J))
      c.learn("SW",elt,ket_elt_bd(j+1,i-1,I,J))
      c.learn("W",elt,ket_elt_bd(j,i-1,I,J))
      c.learn("NW",elt,ket_elt_bd(j-1,i-1,I,J))

Anyway, this shouldn't be surprising given my claim BKO can represent almost anything....
Top of the tree is |x>, then we have: (left for left-child, right for right-child):

left |x> => |a>
right |x> => |b>

left |a> => |c>
right |a> => |d>

left |b> => |e>
right |b> => |f>

left |c> => |g>
right |c> => |h>

left |d> => |i>
right |d> => |j>

left |e> => |k>
right |e> => |l>

left |f> => |m>
right |f> => |n>

And then we can add other info to nodes. eg:

text |x> => |start node>
text |a> => |first child node>
text |b> => |second child node>

OK. Now, let's look at this after we load it into the console:

----------------------------------------
|context> => |context: simple binary tree>

left |x> => |a>
right |x> => |b>
text |x> => |start node>

left |a> => |c>
right |a> => |d>
text |a> => |first child node>

left |b> => |e>
right |b> => |f>
text |b> => |second child node>

left |c> => |g>
right |c> => |h>

left |d> => |i>
right |d> => |j>

left |e> => |k>
right |e> => |l>

left |f> => |m>
right |f> => |n>

child |*> #=> left |_self> + right |_self>
----------------------------------------

And now we have this, we can descend the tree, eg:

sa: right left |x>
|d>

sa: right right left |x>
|j>

sa: right left right |x>
|l>

sa: child^2 |a>
|g> + |h> + |i> + |j>

sa: child^2 |b>
|k> + |l> + |m> + |n>

sa: child^3 |x>
|g> + |h> + |i> + |j> + |k> + |l> + |m> + |n>

And if nodes don't exist, the code handles that gracefully by returning the empty ket (which is also the identity ket):

sa: left child^3 |x>
|>

sa: child^4 |x>
|>

Next, what if we want multiple levels of the tree at once? Well, that is trivial enough too (though the current parser does not handle it), so we have to do it a little more verbosely:

sa: 1 |x>
|x>

sa: -- (1 + child) |x>
sa: |x> + child |x>
|x> + |a> + |b>

sa: -- (1 + child + child^2) |x>                                                         -- this is the notation we would use if the parser could handle it.
sa: |x> + child |x> + child^2 |x>                                                        -- instead, for now, we have to do this.
|x> + |a> + |b> + |c> + |d> + |e> + |f>

sa: -- (1 + child + child^2 + child^3) |x>
sa: |x> + child |x> + child^2 |x> + child^3 |x>
|x> + |a> + |b> + |c> + |d> + |e> + |f> + |g> + |h> + |i> + |j> + |k> + |l> + |m> + |n>

sa: -- (1 + child + child^2 + child^3 + child^4) |x>
sa: |x> + child |x> + child^2 |x> + child^3 |x> + child^4 |x>
|x> + |a> + |b> + |c> + |d> + |e> + |f> + |g> + |h> + |i> + |j> + |k> + |l> + |m> + |n>  -- NB: no new terms. We have reached the bottom of the tree.

Which weirdly enough, reminds me of this from Quantum Mechanics:

exp(A) |Psi>
which expands to:
(1 + A + A^2/2 + A^3/3! + A^4/4! + A^5/5! + ... A^n/n! + ...) |Psi>
where A is just some QM operator

But we don't want the 1/n! coeffs, so just apply the "clean" sigmoid:

clean exp(child) |x>                      -- heh. I might write a exp[child,n] function now. This looks useful!
which expands to:                         -- eg, another application is the six degrees of separation idea: exp[friends,6] |Fred>
(1 + child + child^2 + child^3 + child^4 + ... + child^n + ... ) |x>

Now, finally, if we create inverses in the console, we can ascend the tree too.

sa: create inverse
sa: dump
----------------------------------------
|context> => |context: simple binary tree>

left |x> => |a>
right |x> => |b>
text |x> => |start node>

left |a> => |c>
right |a> => |d>
text |a> => |first child node>
inverse-left |a> => |x>

left |b> => |e>
right |b> => |f>
text |b> => |second child node>
inverse-right |b> => |x>

left |c> => |g>
right |c> => |h>
inverse-left |c> => |a>

left |d> => |i>
right |d> => |j>
inverse-right |d> => |a>

left |e> => |k>
right |e> => |l>
inverse-left |e> => |b>

left |f> => |m>
right |f> => |n>
inverse-right |f> => |b>

child |*> #=> left |_self> + right |_self>

inverse-supported-ops |op: left> => |x> + |a> + |b> + |c> + |d> + |e> + |f>
inverse-supported-ops |op: right> => |x> + |a> + |b> + |c> + |d> + |e> + |f>
inverse-supported-ops |op: text> => |x> + |a> + |b>
inverse-text |start node> => |x>
inverse-supported-ops |op: inverse-left> => |a> + |c> + |e> + |g> + |i> + |k> + |m>
inverse-text |first child node> => |a>
inverse-supported-ops |op: inverse-right> => |b> + |d> + |f> + |h> + |j> + |l> + |n>
inverse-text |second child node> => |b>
inverse-left |g> => |c>
inverse-right |h> => |c>
inverse-left |i> => |d>
inverse-right |j> => |d>
inverse-left |k> => |e>
inverse-right |l> => |e>
inverse-left |m> => |f>
inverse-right |n> => |f>
inverse-supported-ops |op: child> => |*>
inverse-supported-ops |op: inverse-supported-ops> => |op: left> + |op: right> + |op: text> + |op: inverse-left> + |op: inverse-right> + |op: child> + |op: inverse-supported-ops> + |op: inverse-text>
inverse-supported-ops |op: inverse-text> => |start node> + |first child node> + |second child node>
----------------------------------------

sa: inverse-left |a>
|x>

sa: inverse-right |b>
|x>

sa: inverse-right inverse-left |m>
|b>

sa: inverse-right inverse-right inverse-left |m>
|x>

sa: parent |*> #=> inverse-left |_self> + inverse-right |_self>   -- create the parent operator out of inverse-left and inverse-right.

sa: parent |m>
|f>

sa: parent^2 |m>
|b>

sa: parent^3 |m>
|x>

sa: parent^2 |k>
|b>

sa: parent^2 |h>
|a>

sa: parent^3 |h>
|x>

sa: parent^4 |h>
|>                                                                -- we have gone past the top of the tree.

And that I think, covers the basics of binary trees in BKO.

And now we have the idea of a basic binary tree, we can do thinks like represent a LISP list.
(where you can consider CAR as the left node, and CDR as the right node in the tree).
eg:

(the cat spied a rat)
becomes:
CAR |the cat spied a rat> => |the>
CDR |the cat spied a rat> => |cat spied a rat>
CAR |cat spied a rat> => |cat>
CDR |cat spied a rat> => |spied a rat>
CAR |spied a rat> => |spied>
CDR |spied a rat> => |a rat>
CAR |a rat> => |a>
CDR |a rat> => |rat>
CAR |rat> => |rat>
CDR |rat> => |>

12/5/2014: Implemented exp[op,n] |x> as inspired by yesterdays write up.
The code:

# exp[child,n] |x>
# maps to: (1 + child + child^2 + ... + child^n ) |x>
# cf: exp(A) |Psi> in QM.
# if n <= 0, return |x>
# 
def exp(one,context,parameters):
  try:
    op, n = parameters.split(",")              # slightly hackish. Don't know a better way to do it ....
    print("exp op " + op)
    print("exp n  " + n)
    n = int(n)
  except:
    return one

  r = one
  tmp = one
  for k in range(n):
    tmp = tmp.apply_op(context,op)
    r += tmp
  return r

And now for some examples. So let's use the binary tree data from yesterday.

sa: load simple-binary-tree.sw
sa: dump
----------------------------------------
|context> => |context: simple binary tree>

left |x> => |a>
right |x> => |b>
text |x> => |start node>

left |a> => |c>
right |a> => |d>
text |a> => |first child node>

left |b> => |e>
right |b> => |f>
text |b> => |second child node>

left |c> => |g>
right |c> => |h>

left |d> => |i>
right |d> => |j>

left |e> => |k>
right |e> => |l>

left |f> => |m>
right |f> => |n>

child |*> #=> left |_self> + right |_self>
----------------------------------------

sa: exp[child,-1] |x>                              -- if you get the n parameter wrong, it just returns |x>
|x>

sa: exp[child,0] |x>                               -- this is an expected result.
|x>

sa: exp[child,1] |x>
|x> + |a> + |b>

sa: exp[child,2] |x>
|x> + |a> + |b> + |c> + |d> + |e> + |f>

sa: exp[child,3] |x>
|x> + |a> + |b> + |c> + |d> + |e> + |f> + |g> + |h> + |i> + |j> + |k> + |l> + |m> + |n>

sa: exp[child,4] |x>                               -- notice no new terms. We are at the bottom of the tree.
|x> + |a> + |b> + |c> + |d> + |e> + |f> + |g> + |h> + |i> + |j> + |k> + |l> + |m> + |n>

sa: exp[child,5] |x>                               -- notice no new terms. We are at the bottom of the tree.
|x> + |a> + |b> + |c> + |d> + |e> + |f> + |g> + |h> + |i> + |j> + |k> + |l> + |m> + |n>

sa: exp[child,1] |a>
|a> + |c> + |d>

sa: exp[child,1] |b>
|b> + |e> + |f>

sa: exp[child,1] |f>
|f> + |m> + |n>

sa: exp[child,1] (|a> + |f>)                       -- NB: exp[] can be applied to both kets and superpositions.
|a> + |f> + |c> + |d> + |m> + |n>                  -- Here we essentially have: exp[child,1] |a> + exp[child,1] |f>

sa: exp[child,1] |x>
|x> + |a> + |b>

sa: text exp[child,1] |x>                          -- I think this is cool result! Shows a little of the power of my notation.
|start node> + |first child node> + |second child node>

13/5/2014: OK. Let's redo the binary tree but this time using binary labels. 0 for left child, 1 for right.
This has the advantage that we can look at a child node, and instantly see how deep it is in the tree (by the number of digits), and its pathway through the tree.

text |x> => |start node>
left |x> => |0>                                  -- left |x>
right |x> => |1>                                 -- right |x>

text |0> => |first child node>
left |0> => |00>                                 -- left left |x>
right |0> => |10>                                -- right left |x>

text |1> => |second child node>
left |1> => |01>                                 -- left right |x>
right |1> => |11>                                -- right right |x>

text |00> => |third child node>
left |00> => |000>                               -- left left left |x>
right |00> => |100>                              -- right left left |x>

text |10> => |fourth child node>
left |10> => |010>                               -- left right left |x>
right |10> => |110>                              -- right right left |x>

text |01> => |fifth child node>
left |01> => |001>                               -- left left right |x>
right |01> => |101>                              -- right left right |x>

text |11> => |sixth child node>
left |11> => |011>                               -- left right right |x>
right |11> => |111>                              -- right right right |x>

child |*> #=> left |_self> + right |_self>

OK. Now we have these new ket labels, showing the difference between child^n |x> and exp[child,n] |x> will be easier.

sa: child |x>
|0> + |1>

sa: text child |x>
|first child node> + |second child node>

sa: exp[child,1] |x>
|x> + |0> + |1>

sa: text exp[child,1] |x>
|start node> + |first child node> + |second child node>


sa: child^2 |x>
|00> + |10> + |01> + |11>

sa: text child^2 |x>
|third child node> + |fourth child node> + |fifth child node> + |sixth child node>

sa: exp[child,2] |x>
|x> + |0> + |1> + |00> + |10> + |01> + |11>

sa: text exp[child,2] |x>
|start node> + |first child node> + |second child node> + |third child node> + |fourth child node> + |fifth child node> + |sixth child node>


sa: child^3 |x>
|000> + |100> + |010> + |110> + |001> + |101> + |011> + |111>

sa: exp[child,3] |x>
|x> + |0> + |1> + |00> + |10> + |01> + |11> + |000> + |100> + |010> + |110> + |001> + |101> + |011> + |111>

6/8/2014 update: OK. I finally wrote an exp-max function.
It's just like exp, but it keeps going until it has sucked everything in.
Here is the code:

# exp-max[op] |x>
# maps to (1 + op + op^2 + ... op^n) |x>
# such that exp[op,n] |x> == exp[op,n+1] |x>  (strictly speaking, it is their lengths being compared)
# Warning though, we have no idea before hand how large n and the resulting superposition is going to be.
# Also, for large data sets this is going to be big-O expensive. But someone smarter than me can fix that problem, presumably.
def exp_max(one,context,parameters):
  try:
    op, t = parameters.split(",")
    t = int(t)
  except:
    op = parameters
    t = 0

  r = one
  tmp = one
  previous_size = len(r)                    # yup. I finally implemented len() for superpositions/kets.
  n = 0
  while True:
    tmp = tmp.apply_op(context,op)
    r += tmp
#    if len(r) == previous_size:            # a variant is: len(r) - previous_size <= t
    if len(r) - previous_size <= t:         # since kets add in sp, this difference is the number of newly discovered kets.
      break                                 # so, if this is 0, then we have reached the end of the network.
    previous_size = len(r)                  # if this is say 1, then in this round we only found 1 new ket.
    n += 1                                  # which in some cases is enough to say, this will suffice as the end of the network.
  print("n:",n)
  return r

A comment:

# Something I have wanted to do for a very long time is to split an academic field of study into categories.
# Roughly: exp-max[references,t] |some seed physics paper>
# where the "references" operator applied to a paper on arxiv.org returns the list of papers it references.
# We may (though maybe not) need t > 0, else it might drag in all of arxiv.org

Now, for some examples using the binary tree data.

sa: load binary-tree.sw
sa: child |*> #=> left |_self> + right |_self>
sa: exp-max[child] |x>                     -- slurp the whole tree, starting at the |x> node.
n: 3                                       -- BTW, considering the verbosity in the console, the big-O on this is probably bad!
|x> + |0> + |1> + |00> + |10> + |01> + |11> + |000> + |100> + |010> + |110> + |001> + |101> + |011> + |111>

sa: exp-max[child] |0>                     -- slurp the whole tree, starting at the |0> node.
n: 2                                       -- NB: here n = 2, above it was n = 3 (where n is how many steps exp has decended)
|0> + |00> + |10> + |000> + |100> + |010> + |110>

sa: exp-max[left] |x>                      -- slurp down the left branch of the tree. 
n: 3
|x> + |0> + |00> + |000>

sa: exp-max[right] |x>                     -- slurp down the right branch of the tree.
n: 3
|x> + |1> + |11> + |111>


sa: create inverse                         -- create all the inverses
sa: exp-max[inverse-left] |000>            -- climb up the tree starting at |000>
n: 3
|000> + |00> + |0> + |x>

-- now some examples from the middle branches, not the edges:
sa: exp-max[inverse-left] |101> 
n: 0          
|101>                                      -- inverse-left is not defined for |101>

sa: exp-max[inverse-right] |101>           -- try again:
n: 1
|101> + |01>                               -- inverse-right is not defined for |01> (hence we can climb no higher)

-- lets define a general rule for parents:
sa: parent |*> #=> inverse-left |_self> + inverse-right |_self>

-- try again:
sa: exp-max[parent] |101>
n: 3
|101> + |01> + |1> + |x>                   -- success.

sa: exp-max[parent] |011>
n: 3
|011> + |11> + |1> + |x>                   -- yup. Using parent we can climb the tree from any branch upwards to the top.

BTW, perhaps another use or two of this code is:

exp-max[people-you-know] |some seed person>
exp-max[url-links-to] |some seed webpage>

Quick test, and even cyclic networks don't cause an infinite loop. See here:

sa: load simple-network.sw
sa: matrix[O]
[ a1  ] = [  0     0     0     0     0     0     0     0     0     1.00  0     0     0     0     0     0     0     ] [ a1  ]
[ a2  ]   [  1.00  0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     ] [ a2  ]
[ a3  ]   [  0     1.00  0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     ] [ a3  ]
[ a4  ]   [  0     0     1.00  0     0     0     0     0     0     0     0     0     0     0     0     0     0     ] [ a4  ]
[ a5  ]   [  0     0     0     1.00  0     0     0     0     0     0     0     0     0     0     0     0     0     ] [ a5  ]
[ a6  ]   [  0     0     0     0     1.00  0     0     0     0     0     0     0     0     0     0     0     0     ] [ a6  ]
[ a7  ]   [  0     0     0     0     0     1.00  0     0     0     0     0     0     0     0     0     0     0     ] [ a7  ]
[ a8  ]   [  0     0     0     0     0     0     1.00  0     0     0     0     0     0     0     0     0     0     ] [ a8  ]
[ a9  ]   [  0     0     0     0     0     0     0     1.00  0     0     0     0     0     0     0     0     0     ] [ a9  ]
[ a10 ]   [  0     0     0     0     0     0     0     0     1.00  0     0     0     0     0     0     0     0     ] [ a10 ]
[ b1  ]   [  0     0     0     0     0     0     0     0     0     1.00  0     0     0     0     0     0     1.00  ] [ b1  ]
[ b2  ]   [  0     0     0     0     0     0     0     0     0     0     1.00  0     0     0     0     0     0     ] [ b2  ]
[ b3  ]   [  0     0     0     0     0     0     0     0     0     0     0     1.00  0     0     0     0     0     ] [ b3  ]
[ b4  ]   [  0     0     0     0     0     0     0     0     0     0     0     0     1.00  0     0     0     0     ] [ b4  ]
[ b5  ]   [  0     0     0     0     0     0     0     0     0     0     0     0     0     1.00  0     0     0     ] [ b5  ]
[ b6  ]   [  0     0     0     0     0     0     0     0     0     0     0     0     0     0     1.00  0     0     ] [ b6  ]
[ b7  ]   [  0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     1.00  0     ] [ b7  ]

sa: exp-max[O] |a1>
n: 16
2.000|a1> + 2.000|a2> + 2.000|a3> + 2.000|a4> + 2.000|a5> + 2.000|a6> + 2.000|a7> + 2.000|a8> + |a9> + |a10> + 2.000|b1> + |b2> + |b3> + |b4> + |b5> + |b6> + |b7>

sa: exp-max[O] |a3>
n: 14
2.000|a3> + 2.000|a4> + 2.000|a5> + 2.000|a6> + 2.000|a7> + 2.000|a8> + |a9> + |a10> + |a1> + 2.000|b1> + |a2> + |b2> + |b3> + |b4> + |b5> + |b6> + |b7>

sa: exp-max[O] |b1>                -- NB: {b1,b2,b3,b5,b6,b7} are a sub-network.
n: 6                               -- so we never hit any of the a_n nodes.
2.000|b1> + |b2> + |b3> + |b4> + |b5> + |b6> + |b7>

sa: exp-max[O] |b3>
n: 6
2.000|b3> + |b4> + |b5> + |b6> + |b7> + |b1> + |b2>

Note, I don't really understand the meaning of the coeffs, but we can get rid of them easily enough using clean.

sa: clean exp-max[O] |a1>
n: 16
|a1> + |a2> + |a3> + |a4> + |a5> + |a6> + |a7> + |a8> + |a9> + |a10> + |b1> + |b2> + |b3> + |b4> + |b5> + |b6> + |b7>

sa: clean exp-max[O] |a3>
n: 14
|a3> + |a4> + |a5> + |a6> + |a7> + |a8> + |a9> + |a10> + |a1> + |b1> + |a2> + |b2> + |b3> + |b4> + |b5> + |b6> + |b7>

sa: clean exp-max[O] |b1>
n: 6
|b1> + |b2> + |b3> + |b4> + |b5> + |b6> + |b7>

Also, these newtorks and sub-networks are making me think of group theory in maths (and groups vs subgroups and so on).
In this case {b1,b2,b3,b4,b5,b6,b7} is a "sub-group" of {a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,b1,b2,b3,b4,b5,b6,b7}
Hrmm... and stuff about cyclic networks vs non-cyclic networks (like the binary-tree above)....
Where I guess you could call cyclic networks "closed", while non-cyclic networks are not (you have "dead ends" that don't link back to the start).

OK. This is weird/fun. I decided to create inverses, and see what happens if I used inverse-O.

sa: create inverse
sa: exp-max[inverse-O] |a1>
n: 9
2.000|a1> + |a10> + |a9> + |a8> + |a7> + |a6> + |a5> + |a4> + |a3> + |a2>

sa: exp-max[inverse-O] |a3>
n: 9
2.000|a3> + |a2> + |a1> + |a10> + |a9> + |a8> + |a7> + |a6> + |a5> + |a4>

sa: exp-max[inverse-O] |b3>
n: 12
2.000|b3> + 2.000|b2> + 2.000|b1> + 3.000|a10> + 2.000|b7> + 2.000|a9> + 2.000|b6> + 2.000|a8> + 2.000|b5> + 2.000|a7> + 2.000|b4> + |a6> + |a5> + |a4> + |a3> + |a2> + |a1>

Heh, so now {a1,a2,a3,a4,a5,a6,a7,a8,a9,a10} is the sub-group of {a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,b1,b2,b3,b4,b5,b6,b7}
Anyway, this is what the BKO looks like after running create inverse:

O |a1> => |a2>
inverse-O |a1> => |a10>

O |a2> => |a3>
inverse-O |a2> => |a1>

O |a3> => |a4>
inverse-O |a3> => |a2>

O |a4> => |a5>
inverse-O |a4> => |a3>

O |a5> => |a6>
inverse-O |a5> => |a4>

O |a6> => |a7>
inverse-O |a6> => |a5>

O |a7> => |a8>
inverse-O |a7> => |a6>

O |a8> => |a9>
inverse-O |a8> => |a7>

O |a9> => |a10>
inverse-O |a9> => |a8>

O |a10> => |a1> + |b1>
inverse-O |a10> => |a9>

O |b1> => |b2>
inverse-O |b1> => |a10> + |b7>

O |b2> => |b3>
inverse-O |b2> => |b1>

O |b3> => |b4>
inverse-O |b3> => |b2>

O |b4> => |b5>
inverse-O |b4> => |b3>

O |b5> => |b6>
inverse-O |b5> => |b4>

O |b6> => |b7>
inverse-O |b6> => |b5>

O |b7> => |b1>
inverse-O |b7> => |b6>

And here is the inverse-O matrix:

sa: matrix[inverse-O]
[ a1  ] = [  0     1.00  0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     ] [ a1  ]
[ a2  ]   [  0     0     1.00  0     0     0     0     0     0     0     0     0     0     0     0     0     0     ] [ a2  ]
[ a3  ]   [  0     0     0     1.00  0     0     0     0     0     0     0     0     0     0     0     0     0     ] [ a3  ]
[ a4  ]   [  0     0     0     0     1.00  0     0     0     0     0     0     0     0     0     0     0     0     ] [ a4  ]
[ a5  ]   [  0     0     0     0     0     1.00  0     0     0     0     0     0     0     0     0     0     0     ] [ a5  ]
[ a6  ]   [  0     0     0     0     0     0     1.00  0     0     0     0     0     0     0     0     0     0     ] [ a6  ]
[ a7  ]   [  0     0     0     0     0     0     0     1.00  0     0     0     0     0     0     0     0     0     ] [ a7  ]
[ a8  ]   [  0     0     0     0     0     0     0     0     1.00  0     0     0     0     0     0     0     0     ] [ a8  ]
[ a9  ]   [  0     0     0     0     0     0     0     0     0     1.00  0     0     0     0     0     0     0     ] [ a9  ]
[ a10 ]   [  1.00  0     0     0     0     0     0     0     0     0     1.00  0     0     0     0     0     0     ] [ a10 ]
[ b1  ]   [  0     0     0     0     0     0     0     0     0     0     0     1.00  0     0     0     0     0     ] [ b1  ]
[ b2  ]   [  0     0     0     0     0     0     0     0     0     0     0     0     1.00  0     0     0     0     ] [ b2  ]
[ b3  ]   [  0     0     0     0     0     0     0     0     0     0     0     0     0     1.00  0     0     0     ] [ b3  ]
[ b4  ]   [  0     0     0     0     0     0     0     0     0     0     0     0     0     0     1.00  0     0     ] [ b4  ]
[ b5  ]   [  0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     1.00  0     ] [ b5  ]
[ b6  ]   [  0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     1.00  ] [ b6  ]
[ b7  ]   [  0     0     0     0     0     0     0     0     0     0     1.00  0     0     0     0     0     0     ] [ b7  ]
|matrix>

6/8/2014 update: hrm... we can collapse it all into one network with no sub-networks easily enough.

sa: create inverse
sa: nbr |*> #=> O |_self> + inverse-O |_self>     -- propogate up and down in each step. Though it does make things somewhat inefficient!
sa: relevant-kets[O]
|a1> + |a2> + |a3> + |a4> + |a5> + |a6> + |a7> + |a8> + |a9> + |a10> + |b1> + |b2> + |b3> + |b4> + |b5> + |b6> + |b7>

sa: vector[nbr] relevant-kets[O]
[ a1  ] = [  0     1.00  0     0     0     0     0     0     0     1.00  0     0     0     0     0     0     0     ] [ a1  ]
[ a2  ]   [  1.00  0     1.00  0     0     0     0     0     0     0     0     0     0     0     0     0     0     ] [ a2  ]
[ a3  ]   [  0     1.00  0     1.00  0     0     0     0     0     0     0     0     0     0     0     0     0     ] [ a3  ]
[ a4  ]   [  0     0     1.00  0     1.00  0     0     0     0     0     0     0     0     0     0     0     0     ] [ a4  ]
[ a5  ]   [  0     0     0     1.00  0     1.00  0     0     0     0     0     0     0     0     0     0     0     ] [ a5  ]
[ a6  ]   [  0     0     0     0     1.00  0     1.00  0     0     0     0     0     0     0     0     0     0     ] [ a6  ]
[ a7  ]   [  0     0     0     0     0     1.00  0     1.00  0     0     0     0     0     0     0     0     0     ] [ a7  ]
[ a8  ]   [  0     0     0     0     0     0     1.00  0     1.00  0     0     0     0     0     0     0     0     ] [ a8  ]
[ a9  ]   [  0     0     0     0     0     0     0     1.00  0     1.00  0     0     0     0     0     0     0     ] [ a9  ]
[ a10 ]   [  1.00  0     0     0     0     0     0     0     1.00  0     1.00  0     0     0     0     0     0     ] [ a10 ]
[ b1  ]   [  0     0     0     0     0     0     0     0     0     1.00  0     1.00  0     0     0     0     1.00  ] [ b1  ]
[ b2  ]   [  0     0     0     0     0     0     0     0     0     0     1.00  0     1.00  0     0     0     0     ] [ b2  ]
[ b3  ]   [  0     0     0     0     0     0     0     0     0     0     0     1.00  0     1.00  0     0     0     ] [ b3  ]
[ b4  ]   [  0     0     0     0     0     0     0     0     0     0     0     0     1.00  0     1.00  0     0     ] [ b4  ]
[ b5  ]   [  0     0     0     0     0     0     0     0     0     0     0     0     0     1.00  0     1.00  0     ] [ b5  ]
[ b6  ]   [  0     0     0     0     0     0     0     0     0     0     0     0     0     0     1.00  0     1.00  ] [ b6  ]
[ b7  ]   [  0     0     0     0     0     0     0     0     0     0     1.00  0     0     0     0     1.00  0     ] [ b7  ]
|matrix>
-- no sort order bug here.

sa: exp-max[nbr] |a3>
100.000|a3> + 49.000|a4> + 50.000|a2> + 77.000|a5> + 89.000|a1> + 28.000|a6> + 39.000|a10> + 45.000|a7> + 59.000|b1> + 56.000|a9> + 17.000|a8> + 10.000|b2> + 10.000|b7> + 11.000|b3> + 11.000|b6> + 2.000|b4> + 2.000|b5>
-- as promised, all in one network.

sa: exp-max[nbr] |b3>
133.000|b3> + 199.000|b4> + 268.000|b2> + 135.000|b5> + 204.000|b1> + 160.000|b6> + 273.000|a10> + 229.000|b7> + 70.000|a1> + 70.000|a9> + 70.000|a2> + 70.000|a8> + 12.000|a3> + 12.000|a7> + 13.000|a4> + 13.000|a6> + 2.000|a5> 
-- also all in one network.

-- and for a quick idea of how inefficient it is to go up and down with each step:
sa: count-sum exp-max[nbr] |a3>
|number: 655.0>

sa: count-sum exp-max[nbr] |b3>
|number: 1933.0>

sa: count-sum exp-max[nbr] |b1>
|number: 840.0>

-- heh. OK. Another (more efficient) way to do it. Though not 100% it will slurp in the entire network.
-- Indeed, my hunch is that there exist more interesting networks where this trick is not sufficient to slurp in the whole network.
sa: exp-max[O] |a1> + exp-max[inverse-O] |a1>
4.000|a1> + 3.000|a2> + 3.000|a3> + 3.000|a4> + 3.000|a5> + 3.000|a6> + 3.000|a7> + 3.000|a8> + 2.000|a9> + 2.000|a10> + 2.000|b1> + |b2> + |b3> + |b4> + |b5> + |b6> + |b7>

sa: exp-max[O] |b1> + exp-max[inverse-O] |b1>
4.000|b1> + 2.000|b2> + 2.000|b3> + 3.000|b4> + 3.000|b5> + 3.000|b6> + 3.000|b7> + 3.000|a10> + 2.000|a9> + 2.000|a8> + 2.000|a7> + |a6> + |a5> + |a4> + |a3> + |a2> + |a1> 

sa: count-sum (exp-max[O] |a1> + exp-max[inverse-O] |a1>)
|number: 37.0>

sa: count-sum (exp-max[O] |b1> + exp-max[inverse-O] |b1>)
|number: 35.0>

6/8/2014 update: Hrmm... we can make the binary tree connected/cyclic with some work:

sa: create inverse
sa: parent |*> #=> inverse-left |_self> + inverse-right |_self>
sa: nghbr |*> #=> parent |_self> + child |_self>     -- this is the useful bit (defining a neighbour operator), but we need a little work first.
sa: exp-max[child] |x>    -- OK. We have a copy of the tree.
n: 3
|x> + |0> + |1> + |00> + |10> + |01> + |11> + |000> + |100> + |010> + |110> + |001> + |101> + |011> + |111>

sa: map[nghbr] exp-max[child] |x>                    -- create the nghbr data for all elements in the tree.
sa: matrix[nghbr]                                    -- take a look
[ 00  ] = [  1.00  1.00  1.00  0     0     0     1.00  0     0     0     1.00  0     0     0     0  1.00  ] [ 0   ]
[ 0   ]   [  0     1.00  0     0     0     0     0     0     0     0     0     0     0     0     0  0     ] [ 00  ]
[ 000 ]   [  0     0     0     0     0     0     0     0     0     0     0     0     0     0     0  0     ] [ 000 ]
[ 01  ]   [  0     0     0     1.00  1.00  1.00  0     0     1.00  0     0     1.00  0     0     0  1.00  ] [ 1   ]
[ 1   ]   [  0     0     0     0     1.00  0     0     0     0     0     0     0     0     0     0  0     ] [ 01  ]
[ 001 ]   [  0     0     0     0     0     0     0     0     0     0     0     0     0     0     0  0     ] [ 001 ]
[ 10  ]   [  1.00  0     0     0     0     0     1.00  1.00  0     0     0     0     1.00  0     0  0     ] [ 10  ]
[ 010 ]   [  0     0     0     0     0     0     0     0     0     0     0     0     0     0     0  0     ] [ 010 ]
[ 11  ]   [  0     0     0     1.00  0     0     0     0     1.00  1.00  0     0     0     1.00  0  0     ] [ 11  ]
[ 011 ]   [  0     0     0     0     0     0     0     0     0     0     0     0     0     0     0  0     ] [ 011 ]
[ 100 ]   [  0     1.00  0     0     0     0     0     0     0     0     0     0     0     0     0  0     ] [ 100 ]
[ 101 ]   [  0     0     0     0     1.00  0     0     0     0     0     0     0     0     0     0  0     ] [ 101 ]
[ 110 ]   [  0     0     0     0     0     0     1.00  0     0     0     0     0     0     0     0  0     ] [ 110 ]
[ 111 ]   [  0     0     0     0     0     0     0     0     1.00  0     0     0     0     0     0  0     ] [ 111 ]
[ x   ]   [  1.00  0     0     1.00  0     0     0     0     0     0     0     0     0     0     0  0     ] [ *   ]
                                                                                                            [ x   ]
-- observe that:
-- |x> has 2 children, and no parents
-- middle nodes have 2 children and 1 parent
-- bottom nodes have 0 children and 1 parent
-- DOH! There is a sort order bug for {0,00,000} and maybe others. eg, the childen of |x> are wrong! 
-- No idea how to fix. But I do know where. In the natural-sort function.                                                                                                            

sa: exp-max[nghbr] |010>         -- put it to use with an example.
n: 6                             -- NB: we didn't need the map[nghbr] function for this to work. That was only so we had a matrix.
16.000|010> + 61.000|10> + 30.000|0> + 15.000|110> + 39.000|x> + 46.000|00> + 9.000|1> + 8.000|000> + 8.000|100> + 11.000|01> + 11.000|11> + |001> + |101> + |011> + |111>

sa: exp-max[nghbr] |01>          -- another example. 
n: 5
61.000|01> + 30.000|1> + 15.000|001> + 15.000|101> + 39.000|x> + 46.000|11> + 9.000|0> + 8.000|011> + 8.000|111> + 11.000|00> + 11.000|10> + |000> + |100> + |010> + |110>

Point being, from anywhere on the binary tree we can reach all the other nodes. We just had to define the nghbr (neighbour) operator.
Anyway, a tweak occurred to me:

-- another way of defining children and parents of a node:
sa: nbr |*> #=> left |_self> + inverse-left |_self> + right |_self> + inverse-right |_self>
-- instead of using matrix, use vector (where you specify the kets of itnerest)
-- the idea is this way we can use ops that are general rules, rather than having to run map with them first.
-- and the only reason we had to run map first was so that we had the right list from context.relevant_kets(op)
sa: vector[nbr] exp-max[child] |x>
[ 0   ] = [  1.00  1.00  0     1.00  1.00  0     0     1.00  1.00  0     0     0     0     0     0     ] [ x   ]
[ 00  ]   [  0     0     0     1.00  0     0     0     0     0     0     0     0     0     0     0     ] [ 0   ]
[ 000 ]   [  0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     ] [ 1   ]
[ 1   ]   [  1.00  0     1.00  0     0     1.00  1.00  0     0     0     0     1.00  1.00  0     0     ] [ 00  ]
[ 01  ]   [  0     0     0     0     0     1.00  0     0     0     0     0     0     0     0     0     ] [ 10  ]
[ 001 ]   [  0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     ] [ 01  ]
[ 10  ]   [  0     1.00  0     0     1.00  0     0     0     0     1.00  1.00  0     0     0     0     ] [ 11  ]
[ 010 ]   [  0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     ] [ 000 ]
[ 11  ]   [  0     0     1.00  0     0     0     1.00  0     0     0     0     0     0     1.00  1.00  ] [ 100 ]
[ 011 ]   [  0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     ] [ 010 ]
[ 100 ]   [  0     0     0     1.00  0     0     0     0     0     0     0     0     0     0     0     ] [ 110 ]
[ 101 ]   [  0     0     0     0     0     1.00  0     0     0     0     0     0     0     0     0     ] [ 001 ]
[ 110 ]   [  0     0     0     0     1.00  0     0     0     0     0     0     0     0     0     0     ] [ 101 ]
[ 111 ]   [  0     0     0     0     0     0     1.00  0     0     0     0     0     0     0     0     ] [ 011 ]
[ x   ]   [  0     1.00  1.00  0     0     0     0     0     0     0     0     0     0     0     0     ] [ 111 ]
|matrix>
-- heh. Still has the sort order bug!

Also, recently wrote this:

# 1/5/2014:
# to-value and to-category (maybe come up with better names!)
# to-value |> => |>
# to-value |19> => 19| >  -- NB the space, cf to-number
# to-value |age: 23> => 23|age>
# to-value |age: 23.5> => 23.5|age>
# to-value |string> => |string> or 0| >        -- currently the first one.
# to-value |cat: val> => |cat: val> or 0|cat>
# to-value |cat1: cat2: 13> => 13|cat1: cat2>
#
# to-category 57| > => |57>
# to-category |age> => |age: 1>
# to-category 23|age> => |age: 23>
def to_value(one):                          # tested. Seems to work as desired!
  # do we need one = one.ket() here?
  cat, value = extract_category_value(one.label)
  print("cat: " + cat)
  print("value: " + value)
  
  if len(cat) == 0:
    label = " "
  else:
    label = cat

  try:
    x = float(value)
    return ket(label,x)
  except:
    return one 

def to_category(one):
  # do we need one = one.ket() here?
  label = one.label
  if label in [""," "]:                      # maybe label.strip() == ""?
    label = ""                               # Also, stop using -- for comments in python!
  else:
    label += ": "
  return ket(label + "%.3f" % one.value)

18/5/2014: OK. Now after 2 days of processing this (yeah, the intersection_fn() needs serious optimizing!), I have the count and simm results for webpage fragmented, using fragments = ["<",">"], into 64k buckets.
$ grep "^count" fragment-documents-64k--post-processing--saved.sw | sed 's/count-1 /\ncount-1 /g'

count-1 |semantic-2-64k> => |number: 4580>
count-2 |semantic-2-64k> => |number: 739>
count-3 |semantic-2-64k> => |number: 304>
count-4 |semantic-2-64k> => |number: 207>
count-5 |semantic-2-64k> => |number: 140>
count-6 |semantic-2-64k> => |number: 87>
count-7 |semantic-2-64k> => |number: 63>
count-8 |semantic-2-64k> => |number: 48>
count-9 |semantic-2-64k> => |number: 35>
count-10 |semantic-2-64k> => |number: 35>

count-1 |eztv-1-64k> => |number: 2888>
count-2 |eztv-1-64k> => |number: 152>
count-3 |eztv-1-64k> => |number: 47>
count-4 |eztv-1-64k> => |number: 32>
count-5 |eztv-1-64k> => |number: 28>
count-6 |eztv-1-64k> => |number: 27>
count-7 |eztv-1-64k> => |number: 26>
count-8 |eztv-1-64k> => |number: 23>
count-9 |eztv-1-64k> => |number: 22>
count-10 |eztv-1-64k> => |number: 22>

count-1 |slashdot-3-64k> => |number: 1203>
count-2 |slashdot-3-64k> => |number: 165>
count-3 |slashdot-3-64k> => |number: 104>
count-4 |slashdot-3-64k> => |number: 89>
count-5 |slashdot-3-64k> => |number: 80>
count-6 |slashdot-3-64k> => |number: 70>
count-7 |slashdot-3-64k> => |number: 55>
count-8 |slashdot-3-64k> => |number: 52>
count-9 |slashdot-3-64k> => |number: 46>
count-10 |slashdot-3-64k> => |number: 45>

count-1 |slashdot-1-64k> => |number: 1179>
count-2 |slashdot-1-64k> => |number: 159>
count-3 |slashdot-1-64k> => |number: 92>
count-4 |slashdot-1-64k> => |number: 79>
count-5 |slashdot-1-64k> => |number: 72>
count-6 |slashdot-1-64k> => |number: 65>
count-7 |slashdot-1-64k> => |number: 55>
count-8 |slashdot-1-64k> => |number: 52>
count-9 |slashdot-1-64k> => |number: 48>
count-10 |slashdot-1-64k> => |number: 47>

count-1 |wc-comments-2-64k> => |number: 528>
count-2 |wc-comments-2-64k> => |number: 99>
count-3 |wc-comments-2-64k> => |number: 54>
count-4 |wc-comments-2-64k> => |number: 44>
count-5 |wc-comments-2-64k> => |number: 38>
count-6 |wc-comments-2-64k> => |number: 36>
count-7 |wc-comments-2-64k> => |number: 34>
count-8 |wc-comments-2-64k> => |number: 32>
count-9 |wc-comments-2-64k> => |number: 31>
count-10 |wc-comments-2-64k> => |number: 29>

count-1 |diary-1-64k> => |number: 629>
count-2 |diary-1-64k> => |number: 157>
count-3 |diary-1-64k> => |number: 90>
count-4 |diary-1-64k> => |number: 77>
count-5 |diary-1-64k> => |number: 66>
count-6 |diary-1-64k> => |number: 62>
count-7 |diary-1-64k> => |number: 62>
count-8 |diary-1-64k> => |number: 58>
count-9 |diary-1-64k> => |number: 56>
count-10 |diary-1-64k> => |number: 56>

count-1 |eztv-2-64k> => |number: 2919>
count-2 |eztv-2-64k> => |number: 182>
count-3 |eztv-2-64k> => |number: 46>
count-4 |eztv-2-64k> => |number: 34>
count-5 |eztv-2-64k> => |number: 31>
count-6 |eztv-2-64k> => |number: 30>
count-7 |eztv-2-64k> => |number: 29>
count-8 |eztv-2-64k> => |number: 26>
count-9 |eztv-2-64k> => |number: 26>
count-10 |eztv-2-64k> => |number: 26>

count-1 |diary-2-64k> => |number: 657>
count-2 |diary-2-64k> => |number: 156>
count-3 |diary-2-64k> => |number: 89>
count-4 |diary-2-64k> => |number: 76>
count-5 |diary-2-64k> => |number: 65>
count-6 |diary-2-64k> => |number: 62>
count-7 |diary-2-64k> => |number: 62>
count-8 |diary-2-64k> => |number: 58>
count-9 |diary-2-64k> => |number: 56>
count-10 |diary-2-64k> => |number: 56>

count-1 |wc-comments-1-64k> => |number: 528>
count-2 |wc-comments-1-64k> => |number: 99>
count-3 |wc-comments-1-64k> => |number: 54>
count-4 |wc-comments-1-64k> => |number: 44>
count-5 |wc-comments-1-64k> => |number: 38>
count-6 |wc-comments-1-64k> => |number: 36>
count-7 |wc-comments-1-64k> => |number: 34>
count-8 |wc-comments-1-64k> => |number: 32>
count-9 |wc-comments-1-64k> => |number: 31>
count-10 |wc-comments-1-64k> => |number: 29>

count-1 |slashdot-2-64k> => |number: 1173>
count-2 |slashdot-2-64k> => |number: 162>
count-3 |slashdot-2-64k> => |number: 93>
count-4 |slashdot-2-64k> => |number: 79>
count-5 |slashdot-2-64k> => |number: 72>
count-6 |slashdot-2-64k> => |number: 65>
count-7 |slashdot-2-64k> => |number: 55>
count-8 |slashdot-2-64k> => |number: 52>
count-9 |slashdot-2-64k> => |number: 48>
count-10 |slashdot-2-64k> => |number: 47>

count-1 |semantic-1-64k> => |number: 2495>
count-2 |semantic-1-64k> => |number: 441>
count-3 |semantic-1-64k> => |number: 213>
count-4 |semantic-1-64k> => |number: 133>
count-5 |semantic-1-64k> => |number: 102>
count-6 |semantic-1-64k> => |number: 62>
count-7 |semantic-1-64k> => |number: 40>
count-8 |semantic-1-64k> => |number: 27>
count-9 |semantic-1-64k> => |number: 23>
count-10 |semantic-1-64k> => |number: 23>

And now the simm results:
$ grep "simm" fragment-documents-64k--post-processing--saved.sw | sed 's/drop-1-simm/\ndrop-1-simm/g'

drop-1-simm |semantic-2-64k> => 65.846|semantic-1-64k> + 8.067|diary-1-64k> + 8.021|diary-2-64k> + 6.087|eztv-1-64k> + 6.086|slashdot-3-64k> + 6.036|eztv-2-64k> + 5.741|slashdot-1-64k> + 5.722|slashdot-2-64k> + 4.875|wc-comments-2-64k> + 4.863|wc-comments-1-64k>
drop-2-simm |semantic-2-64k> => 73.858|semantic-1-64k> + 11.679|diary-1-64k> + 11.633|diary-2-64k> + 8.002|slashdot-3-64k> + 7.878|slashdot-1-64k> + 7.871|slashdot-2-64k> + 7.543|wc-comments-2-64k> + 7.543|wc-comments-1-64k> + 6.206|eztv-1-64k> + 6.198|eztv-2-64k>
drop-3-simm |semantic-2-64k> => 77.788|semantic-1-64k> + 13.761|diary-1-64k> + 13.708|diary-2-64k> + 9.388|slashdot-3-64k> + 9.331|slashdot-1-64k> + 9.328|slashdot-2-64k> + 8.981|wc-comments-2-64k> + 8.981|wc-comments-1-64k> + 7.461|eztv-1-64k> + 7.461|eztv-2-64k>
drop-4-simm |semantic-2-64k> => 77.417|semantic-1-64k> + 14.565|diary-1-64k> + 14.510|diary-2-64k> + 10.068|slashdot-3-64k> + 10.004|slashdot-1-64k> + 10.004|slashdot-2-64k> + 9.532|wc-comments-2-64k> + 9.532|wc-comments-1-64k> + 8.093|eztv-1-64k> + 8.093|eztv-2-64k>
drop-5-simm |semantic-2-64k> => 79.583|semantic-1-64k> + 15.292|diary-1-64k> + 15.236|diary-2-64k> + 10.793|slashdot-3-64k> + 10.720|slashdot-1-64k> + 10.720|slashdot-2-64k> + 9.826|wc-comments-2-64k> + 9.826|wc-comments-1-64k> + 8.778|eztv-1-64k> + 8.778|eztv-2-64k>
drop-6-simm |semantic-2-64k> => 79.927|semantic-1-64k> + 16.338|diary-1-64k> + 16.267|diary-2-64k> + 11.652|slashdot-3-64k> + 11.561|slashdot-1-64k> + 11.561|slashdot-2-64k> + 10.001|wc-comments-2-64k> + 10.001|wc-comments-1-64k> + 9.580|eztv-1-64k> + 9.580|eztv-2-64k>
drop-7-simm |semantic-2-64k> => 79.571|semantic-1-64k> + 16.886|diary-2-64k> + 16.705|diary-1-64k> + 11.901|slashdot-3-64k> + 11.771|slashdot-1-64k> + 11.771|slashdot-2-64k> + 10.171|wc-comments-2-64k> + 10.171|wc-comments-1-64k> + 10.080|eztv-1-64k> + 10.080|eztv-2-64k>
drop-8-simm |semantic-2-64k> => 79.337|semantic-1-64k> + 17.309|diary-2-64k> + 17.050|diary-1-64k> + 12.324|slashdot-3-64k> + 12.191|slashdot-1-64k> + 12.191|slashdot-2-64k> + 10.479|eztv-1-64k> + 10.479|eztv-2-64k> + 9.567|wc-comments-2-64k> + 9.567|wc-comments-1-64k>
drop-9-simm |semantic-2-64k> => 80.936|semantic-1-64k> + 17.558|diary-2-64k> + 17.299|diary-1-64k> + 12.807|slashdot-3-64k> + 12.653|slashdot-1-64k> + 12.653|slashdot-2-64k> + 10.906|eztv-1-64k> + 10.906|eztv-2-64k> + 9.686|wc-comments-2-64k> + 9.686|wc-comments-1-64k>
drop-10-simm |semantic-2-64k> => 80.936|semantic-1-64k> + 17.558|diary-2-64k> + 17.299|diary-1-64k> + 12.818|slashdot-3-64k> + 12.663|slashdot-1-64k> + 12.663|slashdot-2-64k> + 10.906|eztv-1-64k> + 10.906|eztv-2-64k> + 9.866|wc-comments-2-64k> + 9.866|wc-comments-1-64k>

drop-1-simm |eztv-1-64k> => 94.118|eztv-2-64k> + 14.042|slashdot-3-64k> + 13.544|slashdot-2-64k> + 13.489|slashdot-1-64k> + 11.737|diary-2-64k> + 11.588|diary-1-64k> + 11.527|wc-comments-2-64k> + 11.509|wc-comments-1-64k> + 7.677|semantic-1-64k> + 6.087|semantic-2-64k>
drop-2-simm |eztv-1-64k> => 90.561|eztv-2-64k> + 20.942|slashdot-3-64k> + 20.170|slashdot-1-64k> + 20.155|slashdot-2-64k> + 15.597|wc-comments-2-64k> + 15.597|wc-comments-1-64k> + 14.809|diary-2-64k> + 14.520|diary-1-64k> + 9.051|semantic-1-64k> + 6.206|semantic-2-64k>
drop-3-simm |eztv-1-64k> => 92.151|eztv-2-64k> + 21.999|slashdot-3-64k> + 21.427|slashdot-2-64k> + 21.404|slashdot-1-64k> + 16.428|wc-comments-2-64k> + 16.428|wc-comments-1-64k> + 15.645|diary-2-64k> + 15.345|diary-1-64k> + 10.699|semantic-1-64k> + 7.461|semantic-2-64k>
drop-4-simm |eztv-1-64k> => 91.835|eztv-2-64k> + 22.414|slashdot-3-64k> + 21.806|slashdot-2-64k> + 21.750|slashdot-1-64k> + 16.300|wc-comments-2-64k> + 16.300|wc-comments-1-64k> + 15.989|diary-2-64k> + 15.686|diary-1-64k> + 11.832|semantic-1-64k> + 8.093|semantic-2-64k>
drop-5-simm |eztv-1-64k> => 91.655|eztv-2-64k> + 22.688|slashdot-3-64k> + 21.984|slashdot-2-64k> + 21.927|slashdot-1-64k> + 16.544|wc-comments-2-64k> + 16.544|wc-comments-1-64k> + 16.052|diary-2-64k> + 15.745|diary-1-64k> + 12.518|semantic-1-64k> + 8.778|semantic-2-64k>
drop-6-simm |eztv-1-64k> => 91.816|eztv-2-64k> + 23.304|slashdot-3-64k> + 22.407|slashdot-2-64k> + 22.349|slashdot-1-64k> + 16.716|wc-comments-2-64k> + 16.716|wc-comments-1-64k> + 16.183|diary-2-64k> + 15.917|diary-1-64k> + 13.807|semantic-1-64k> + 9.580|semantic-2-64k>
drop-7-simm |eztv-1-64k> => 91.800|eztv-2-64k> + 23.049|slashdot-3-64k> + 22.448|slashdot-2-64k> + 22.388|slashdot-1-64k> + 16.378|wc-comments-2-64k> + 16.378|wc-comments-1-64k> + 16.188|diary-2-64k> + 15.922|diary-1-64k> + 14.815|semantic-1-64k> + 10.080|semantic-2-64k>
drop-8-simm |eztv-1-64k> => 91.744|eztv-2-64k> + 23.340|slashdot-3-64k> + 22.730|slashdot-2-64k> + 22.669|slashdot-1-64k> + 16.198|wc-comments-2-64k> + 16.198|wc-comments-1-64k> + 16.171|diary-2-64k> + 15.905|diary-1-64k> + 15.600|semantic-1-64k> + 10.479|semantic-2-64k>
drop-9-simm |eztv-1-64k> => 91.443|eztv-2-64k> + 24.012|slashdot-3-64k> + 23.161|slashdot-2-64k> + 23.098|slashdot-1-64k> + 16.345|wc-comments-2-64k> + 16.345|wc-comments-1-64k> + 16.320|diary-2-64k> + 16.053|diary-1-64k> + 15.896|semantic-1-64k> + 10.906|semantic-2-64k>
drop-10-simm |eztv-1-64k> => 91.443|eztv-2-64k> + 24.142|slashdot-3-64k> + 23.283|slashdot-2-64k> + 23.221|slashdot-1-64k> + 16.320|diary-2-64k> + 16.053|diary-1-64k> + 15.896|semantic-1-64k> + 15.643|wc-comments-2-64k> + 15.643|wc-comments-1-64k> + 10.906|semantic-2-64k>

drop-1-simm |slashdot-3-64k> => 77.224|slashdot-2-64k> + 77.217|slashdot-1-64k> + 14.042|eztv-1-64k> + 13.832|eztv-2-64k> + 11.313|diary-2-64k> + 11.143|diary-1-64k> + 8.821|wc-comments-2-64k> + 8.821|wc-comments-1-64k> + 8.165|semantic-1-64k> + 6.086|semantic-2-64k>
drop-2-simm |slashdot-3-64k> => 92.552|slashdot-2-64k> + 92.470|slashdot-1-64k> + 20.942|eztv-1-64k> + 20.751|eztv-2-64k> + 13.720|diary-2-64k> + 13.400|diary-1-64k> + 11.105|wc-comments-2-64k> + 11.105|wc-comments-1-64k> + 10.961|semantic-1-64k> + 8.002|semantic-2-64k>
drop-3-simm |slashdot-3-64k> => 95.009|slashdot-1-64k> + 95.007|slashdot-2-64k> + 21.999|eztv-1-64k> + 21.835|eztv-2-64k> + 14.396|diary-2-64k> + 14.059|diary-1-64k> + 12.626|semantic-1-64k> + 10.926|wc-comments-2-64k> + 10.926|wc-comments-1-64k> + 9.388|semantic-2-64k>
drop-4-simm |slashdot-3-64k> => 95.704|slashdot-2-64k> + 95.660|slashdot-1-64k> + 22.414|eztv-1-64k> + 22.253|eztv-2-64k> + 14.732|diary-2-64k> + 14.389|diary-1-64k> + 13.807|semantic-1-64k> + 10.940|wc-comments-2-64k> + 10.940|wc-comments-1-64k> + 10.068|semantic-2-64k>
drop-5-simm |slashdot-3-64k> => 96.441|slashdot-2-64k> + 96.384|slashdot-1-64k> + 22.688|eztv-1-64k> + 22.540|eztv-2-64k> + 14.854|diary-2-64k> + 14.532|semantic-1-64k> + 14.503|diary-1-64k> + 11.211|wc-comments-2-64k> + 11.211|wc-comments-1-64k> + 10.793|semantic-2-64k>
drop-6-simm |slashdot-3-64k> => 97.030|slashdot-2-64k> + 96.971|slashdot-1-64k> + 23.304|eztv-1-64k> + 23.156|eztv-2-64k> + 15.880|semantic-1-64k> + 15.074|diary-2-64k> + 14.752|diary-1-64k> + 11.652|semantic-2-64k> + 11.086|wc-comments-2-64k> + 11.086|wc-comments-1-64k>
drop-7-simm |slashdot-3-64k> => 97.909|slashdot-2-64k> + 97.854|slashdot-1-64k> + 23.049|eztv-1-64k> + 22.900|eztv-2-64k> + 16.636|semantic-1-64k> + 13.495|diary-2-64k> + 13.173|diary-1-64k> + 11.901|semantic-2-64k> + 10.224|wc-comments-2-64k> + 10.224|wc-comments-1-64k>
drop-8-simm |slashdot-3-64k> => 97.882|slashdot-2-64k> + 97.827|slashdot-1-64k> + 23.340|eztv-1-64k> + 23.189|eztv-2-64k> + 17.445|semantic-1-64k> + 13.715|diary-2-64k> + 13.391|diary-1-64k> + 12.324|semantic-2-64k> + 9.594|wc-comments-2-64k> + 9.594|wc-comments-1-64k>
drop-9-simm |slashdot-3-64k> => 97.884|slashdot-2-64k> + 97.839|slashdot-1-64k> + 24.012|eztv-1-64k> + 23.857|eztv-2-64k> + 17.797|semantic-1-64k> + 13.413|diary-2-64k> + 13.079|diary-1-64k> + 12.807|semantic-2-64k> + 9.704|wc-comments-2-64k> + 9.704|wc-comments-1-64k>
drop-10-simm |slashdot-3-64k> => 97.872|slashdot-2-64k> + 97.827|slashdot-1-64k> + 24.142|eztv-1-64k> + 23.952|eztv-2-64k> + 17.808|semantic-1-64k> + 13.430|diary-2-64k> + 13.097|diary-1-64k> + 12.818|semantic-2-64k> + 9.890|wc-comments-2-64k> + 9.890|wc-comments-1-64k>

drop-1-simm |slashdot-1-64k> => 96.165|slashdot-2-64k> + 77.217|slashdot-3-64k> + 13.489|eztv-1-64k> + 13.291|eztv-2-64k> + 10.937|diary-2-64k> + 10.801|diary-1-64k> + 8.392|wc-comments-2-64k> + 8.392|wc-comments-1-64k> + 7.855|semantic-1-64k> + 5.741|semantic-2-64k>
drop-2-simm |slashdot-1-64k> => 97.959|slashdot-2-64k> + 92.470|slashdot-3-64k> + 20.170|eztv-1-64k> + 19.987|eztv-2-64k> + 12.980|diary-2-64k> + 12.660|diary-1-64k> + 10.793|semantic-1-64k> + 10.221|wc-comments-2-64k> + 10.221|wc-comments-1-64k> + 7.878|semantic-2-64k>
drop-3-simm |slashdot-1-64k> => 99.536|slashdot-2-64k> + 95.009|slashdot-3-64k> + 21.404|eztv-1-64k> + 21.241|eztv-2-64k> + 13.739|diary-2-64k> + 13.402|diary-1-64k> + 12.569|semantic-1-64k> + 10.403|wc-comments-2-64k> + 10.403|wc-comments-1-64k> + 9.331|semantic-2-64k>
drop-4-simm |slashdot-1-64k> => 99.494|slashdot-2-64k> + 95.660|slashdot-3-64k> + 21.750|eztv-1-64k> + 21.589|eztv-2-64k> + 14.049|diary-2-64k> + 13.743|semantic-1-64k> + 13.707|diary-1-64k> + 10.713|wc-comments-2-64k> + 10.713|wc-comments-1-64k> + 10.004|semantic-2-64k>
drop-5-simm |slashdot-1-64k> => 99.943|slashdot-2-64k> + 96.384|slashdot-3-64k> + 21.927|eztv-1-64k> + 21.779|eztv-2-64k> + 14.459|semantic-1-64k> + 14.143|diary-2-64k> + 13.792|diary-1-64k> + 10.972|wc-comments-2-64k> + 10.972|wc-comments-1-64k> + 10.720|semantic-2-64k>
drop-6-simm |slashdot-1-64k> => 99.942|slashdot-2-64k> + 96.971|slashdot-3-64k> + 22.349|eztv-1-64k> + 22.201|eztv-2-64k> + 15.789|semantic-1-64k> + 14.312|diary-2-64k> + 13.991|diary-1-64k> + 11.561|semantic-2-64k> + 11.111|wc-comments-2-64k> + 11.111|wc-comments-1-64k>
drop-7-simm |slashdot-1-64k> => 99.940|slashdot-2-64k> + 97.854|slashdot-3-64k> + 22.388|eztv-1-64k> + 22.239|eztv-2-64k> + 16.506|semantic-1-64k> + 13.358|diary-2-64k> + 13.037|diary-1-64k> + 11.771|semantic-2-64k> + 10.218|wc-comments-2-64k> + 10.218|wc-comments-1-64k>
drop-8-simm |slashdot-1-64k> => 99.939|slashdot-2-64k> + 97.827|slashdot-3-64k> + 22.669|eztv-1-64k> + 22.518|eztv-2-64k> + 17.312|semantic-1-64k> + 13.576|diary-2-64k> + 13.252|diary-1-64k> + 12.191|semantic-2-64k> + 9.589|wc-comments-2-64k> + 9.589|wc-comments-1-64k>
drop-9-simm |slashdot-1-64k> => 99.938|slashdot-2-64k> + 97.839|slashdot-3-64k> + 23.098|eztv-1-64k> + 22.943|eztv-2-64k> + 17.642|semantic-1-64k> + 13.242|diary-2-64k> + 12.908|diary-1-64k> + 12.653|semantic-2-64k> + 9.687|wc-comments-2-64k> + 9.687|wc-comments-1-64k>
drop-10-simm |slashdot-1-64k> => 99.937|slashdot-2-64k> + 97.827|slashdot-3-64k> + 23.221|eztv-1-64k> + 23.065|eztv-2-64k> + 17.652|semantic-1-64k> + 13.258|diary-2-64k> + 12.924|diary-1-64k> + 12.663|semantic-2-64k> + 9.873|wc-comments-2-64k> + 9.873|wc-comments-1-64k>

drop-1-simm |wc-comments-2-64k> => 99.533|wc-comments-1-64k> + 39.457|diary-1-64k> + 39.239|diary-2-64k> + 11.952|eztv-2-64k> + 11.527|eztv-1-64k> + 8.821|slashdot-3-64k> + 8.460|slashdot-2-64k> + 8.392|slashdot-1-64k> + 6.678|semantic-1-64k> + 4.875|semantic-2-64k>
drop-2-simm |wc-comments-2-64k> => 99.906|wc-comments-1-64k> + 42.452|diary-2-64k> + 42.446|diary-1-64k> + 16.382|eztv-2-64k> + 15.597|eztv-1-64k> + 11.105|slashdot-3-64k> + 10.221|slashdot-1-64k> + 10.214|slashdot-2-64k> + 8.796|semantic-1-64k> + 7.543|semantic-2-64k>
drop-3-simm |wc-comments-2-64k> => 99.898|wc-comments-1-64k> + 41.866|diary-2-64k> + 41.846|diary-1-64k> + 17.344|eztv-2-64k> + 16.428|eztv-1-64k> + 10.926|slashdot-3-64k> + 10.403|slashdot-1-64k> + 10.400|slashdot-2-64k> + 9.476|semantic-1-64k> + 8.981|semantic-2-64k>
drop-4-simm |wc-comments-2-64k> => 99.895|wc-comments-1-64k> + 41.388|diary-2-64k> + 41.363|diary-1-64k> + 17.221|eztv-2-64k> + 16.300|eztv-1-64k> + 10.940|slashdot-3-64k> + 10.713|slashdot-1-64k> + 10.713|slashdot-2-64k> + 9.717|semantic-1-64k> + 9.532|semantic-2-64k>
drop-5-simm |wc-comments-2-64k> => 99.892|wc-comments-1-64k> + 41.749|diary-2-64k> + 41.721|diary-1-64k> + 17.623|eztv-2-64k> + 16.544|eztv-1-64k> + 11.211|slashdot-3-64k> + 10.972|slashdot-1-64k> + 10.972|slashdot-2-64k> + 10.001|semantic-1-64k> + 9.826|semantic-2-64k>
drop-6-simm |wc-comments-2-64k> => 99.891|wc-comments-1-64k> + 42.199|diary-1-64k> + 42.175|diary-2-64k> + 17.797|eztv-2-64k> + 16.716|eztv-1-64k> + 11.111|slashdot-1-64k> + 11.111|slashdot-2-64k> + 11.086|slashdot-3-64k> + 10.204|semantic-1-64k> + 10.001|semantic-2-64k>
drop-7-simm |wc-comments-2-64k> => 99.889|wc-comments-1-64k> + 41.206|diary-1-64k> + 41.181|diary-2-64k> + 17.521|eztv-2-64k> + 16.378|eztv-1-64k> + 10.408|semantic-1-64k> + 10.224|slashdot-3-64k> + 10.218|slashdot-1-64k> + 10.218|slashdot-2-64k> + 10.171|semantic-2-64k>
drop-8-simm |wc-comments-2-64k> => 99.888|wc-comments-1-64k> + 40.310|diary-1-64k> + 40.280|diary-2-64k> + 17.396|eztv-2-64k> + 16.198|eztv-1-64k> + 9.594|slashdot-3-64k> + 9.589|slashdot-1-64k> + 9.589|slashdot-2-64k> + 9.567|semantic-2-64k> + 9.422|semantic-1-64k>
drop-9-simm |wc-comments-2-64k> => 99.886|wc-comments-1-64k> + 39.936|diary-1-64k> + 39.915|diary-2-64k> + 17.543|eztv-2-64k> + 16.345|eztv-1-64k> + 9.704|slashdot-3-64k> + 9.687|slashdot-1-64k> + 9.687|slashdot-2-64k> + 9.686|semantic-2-64k> + 9.516|semantic-1-64k>
drop-10-simm |wc-comments-2-64k> => 99.884|wc-comments-1-64k> + 38.335|diary-1-64k> + 38.315|diary-2-64k> + 16.841|eztv-2-64k> + 15.643|eztv-1-64k> + 9.890|slashdot-3-64k> + 9.873|slashdot-1-64k> + 9.873|slashdot-2-64k> + 9.866|semantic-2-64k> + 9.696|semantic-1-64k>

drop-1-simm |diary-1-64k> => 95.959|diary-2-64k> + 39.457|wc-comments-2-64k> + 39.457|wc-comments-1-64k> + 12.996|eztv-2-64k> + 11.588|eztv-1-64k> + 11.143|slashdot-3-64k> + 10.801|slashdot-1-64k> + 10.735|slashdot-2-64k> + 10.374|semantic-1-64k> + 8.067|semantic-2-64k>
drop-2-simm |diary-1-64k> => 97.965|diary-2-64k> + 42.446|wc-comments-2-64k> + 42.446|wc-comments-1-64k> + 17.071|eztv-2-64k> + 14.835|semantic-1-64k> + 14.520|eztv-1-64k> + 13.400|slashdot-3-64k> + 12.660|slashdot-1-64k> + 12.648|slashdot-2-64k> + 11.679|semantic-2-64k>
drop-3-simm |diary-1-64k> => 98.515|diary-2-64k> + 41.846|wc-comments-2-64k> + 41.846|wc-comments-1-64k> + 18.197|eztv-2-64k> + 16.206|semantic-1-64k> + 15.345|eztv-1-64k> + 14.059|slashdot-3-64k> + 13.761|semantic-2-64k> + 13.402|slashdot-1-64k> + 13.396|slashdot-2-64k>
drop-4-simm |diary-1-64k> => 98.311|diary-2-64k> + 41.363|wc-comments-2-64k> + 41.363|wc-comments-1-64k> + 18.562|eztv-2-64k> + 16.572|semantic-1-64k> + 15.686|eztv-1-64k> + 14.565|semantic-2-64k> + 14.389|slashdot-3-64k> + 13.707|slashdot-1-64k> + 13.707|slashdot-2-64k>
drop-5-simm |diary-1-64k> => 98.754|diary-2-64k> + 41.721|wc-comments-2-64k> + 41.721|wc-comments-1-64k> + 18.801|eztv-2-64k> + 16.819|semantic-1-64k> + 15.745|eztv-1-64k> + 15.292|semantic-2-64k> + 14.503|slashdot-3-64k> + 13.792|slashdot-1-64k> + 13.792|slashdot-2-64k>
drop-6-simm |diary-1-64k> => 98.822|diary-2-64k> + 42.199|wc-comments-2-64k> + 42.199|wc-comments-1-64k> + 18.978|eztv-2-64k> + 17.165|semantic-1-64k> + 16.338|semantic-2-64k> + 15.917|eztv-1-64k> + 14.752|slashdot-3-64k> + 13.991|slashdot-1-64k> + 13.991|slashdot-2-64k>
drop-7-simm |diary-1-64k> => 98.822|diary-2-64k> + 41.206|wc-comments-2-64k> + 41.206|wc-comments-1-64k> + 18.989|eztv-2-64k> + 17.248|semantic-1-64k> + 16.705|semantic-2-64k> + 15.922|eztv-1-64k> + 13.173|slashdot-3-64k> + 13.037|slashdot-1-64k> + 13.037|slashdot-2-64k>
drop-8-simm |diary-1-64k> => 98.809|diary-2-64k> + 40.310|wc-comments-2-64k> + 40.310|wc-comments-1-64k> + 19.023|eztv-2-64k> + 17.186|semantic-1-64k> + 17.050|semantic-2-64k> + 15.905|eztv-1-64k> + 13.391|slashdot-3-64k> + 13.252|slashdot-1-64k> + 13.252|slashdot-2-64k>
drop-9-simm |diary-1-64k> => 98.801|diary-2-64k> + 39.936|wc-comments-2-64k> + 39.936|wc-comments-1-64k> + 19.165|eztv-2-64k> + 17.367|semantic-1-64k> + 17.299|semantic-2-64k> + 16.053|eztv-1-64k> + 13.079|slashdot-3-64k> + 12.908|slashdot-1-64k> + 12.908|slashdot-2-64k>
drop-10-simm |diary-1-64k> => 98.801|diary-2-64k> + 38.335|wc-comments-2-64k> + 38.335|wc-comments-1-64k> + 19.165|eztv-2-64k> + 17.367|semantic-1-64k> + 17.299|semantic-2-64k> + 16.053|eztv-1-64k> + 13.097|slashdot-3-64k> + 12.924|slashdot-1-64k> + 12.924|slashdot-2-64k>

drop-1-simm |eztv-2-64k> => 94.118|eztv-1-64k> + 13.832|slashdot-3-64k> + 13.343|slashdot-2-64k> + 13.291|slashdot-1-64k> + 13.143|diary-2-64k> + 12.996|diary-1-64k> + 11.952|wc-comments-2-64k> + 11.935|wc-comments-1-64k> + 7.528|semantic-1-64k> + 6.036|semantic-2-64k>
drop-2-simm |eztv-2-64k> => 90.561|eztv-1-64k> + 20.751|slashdot-3-64k> + 19.987|slashdot-1-64k> + 19.972|slashdot-2-64k> + 17.360|diary-2-64k> + 17.071|diary-1-64k> + 16.382|wc-comments-2-64k> + 16.382|wc-comments-1-64k> + 9.051|semantic-1-64k> + 6.198|semantic-2-64k>
drop-3-simm |eztv-2-64k> => 92.151|eztv-1-64k> + 21.835|slashdot-3-64k> + 21.264|slashdot-2-64k> + 21.241|slashdot-1-64k> + 18.497|diary-2-64k> + 18.197|diary-1-64k> + 17.344|wc-comments-2-64k> + 17.344|wc-comments-1-64k> + 10.699|semantic-1-64k> + 7.461|semantic-2-64k>
drop-4-simm |eztv-2-64k> => 91.835|eztv-1-64k> + 22.253|slashdot-3-64k> + 21.646|slashdot-2-64k> + 21.589|slashdot-1-64k> + 18.865|diary-2-64k> + 18.562|diary-1-64k> + 17.221|wc-comments-2-64k> + 17.221|wc-comments-1-64k> + 11.832|semantic-1-64k> + 8.093|semantic-2-64k>
drop-5-simm |eztv-2-64k> => 91.655|eztv-1-64k> + 22.540|slashdot-3-64k> + 21.836|slashdot-2-64k> + 21.779|slashdot-1-64k> + 19.107|diary-2-64k> + 18.801|diary-1-64k> + 17.623|wc-comments-2-64k> + 17.623|wc-comments-1-64k> + 12.518|semantic-1-64k> + 8.778|semantic-2-64k>
drop-6-simm |eztv-2-64k> => 91.816|eztv-1-64k> + 23.156|slashdot-3-64k> + 22.259|slashdot-2-64k> + 22.201|slashdot-1-64k> + 19.244|diary-2-64k> + 18.978|diary-1-64k> + 17.797|wc-comments-2-64k> + 17.797|wc-comments-1-64k> + 13.807|semantic-1-64k> + 9.580|semantic-2-64k>
drop-7-simm |eztv-2-64k> => 91.800|eztv-1-64k> + 22.900|slashdot-3-64k> + 22.299|slashdot-2-64k> + 22.239|slashdot-1-64k> + 19.254|diary-2-64k> + 18.989|diary-1-64k> + 17.521|wc-comments-2-64k> + 17.521|wc-comments-1-64k> + 14.815|semantic-1-64k> + 10.080|semantic-2-64k>
drop-8-simm |eztv-2-64k> => 91.744|eztv-1-64k> + 23.189|slashdot-3-64k> + 22.579|slashdot-2-64k> + 22.518|slashdot-1-64k> + 19.289|diary-2-64k> + 19.023|diary-1-64k> + 17.396|wc-comments-2-64k> + 17.396|wc-comments-1-64k> + 15.600|semantic-1-64k> + 10.479|semantic-2-64k>
drop-9-simm |eztv-2-64k> => 91.443|eztv-1-64k> + 23.857|slashdot-3-64k> + 23.005|slashdot-2-64k> + 22.943|slashdot-1-64k> + 19.432|diary-2-64k> + 19.165|diary-1-64k> + 17.543|wc-comments-2-64k> + 17.543|wc-comments-1-64k> + 15.896|semantic-1-64k> + 10.906|semantic-2-64k>
drop-10-simm |eztv-2-64k> => 91.443|eztv-1-64k> + 23.952|slashdot-3-64k> + 23.128|slashdot-2-64k> + 23.065|slashdot-1-64k> + 19.432|diary-2-64k> + 19.165|diary-1-64k> + 16.841|wc-comments-2-64k> + 16.841|wc-comments-1-64k> + 15.896|semantic-1-64k> + 10.906|semantic-2-64k>

drop-1-simm |diary-2-64k> => 95.959|diary-1-64k> + 39.239|wc-comments-2-64k> + 39.239|wc-comments-1-64k> + 13.143|eztv-2-64k> + 11.737|eztv-1-64k> + 11.313|slashdot-3-64k> + 10.937|slashdot-1-64k> + 10.872|slashdot-2-64k> + 10.305|semantic-1-64k> + 8.021|semantic-2-64k>
drop-2-simm |diary-2-64k> => 97.965|diary-1-64k> + 42.452|wc-comments-2-64k> + 42.452|wc-comments-1-64k> + 17.360|eztv-2-64k> + 15.118|semantic-1-64k> + 14.809|eztv-1-64k> + 13.720|slashdot-3-64k> + 12.980|slashdot-1-64k> + 12.968|slashdot-2-64k> + 11.633|semantic-2-64k>
drop-3-simm |diary-2-64k> => 98.515|diary-1-64k> + 41.866|wc-comments-2-64k> + 41.866|wc-comments-1-64k> + 18.497|eztv-2-64k> + 16.499|semantic-1-64k> + 15.645|eztv-1-64k> + 14.396|slashdot-3-64k> + 13.739|slashdot-1-64k> + 13.734|slashdot-2-64k> + 13.708|semantic-2-64k>
drop-4-simm |diary-2-64k> => 98.311|diary-1-64k> + 41.388|wc-comments-2-64k> + 41.388|wc-comments-1-64k> + 18.865|eztv-2-64k> + 16.951|semantic-1-64k> + 15.989|eztv-1-64k> + 14.732|slashdot-3-64k> + 14.510|semantic-2-64k> + 14.049|slashdot-1-64k> + 14.049|slashdot-2-64k>
drop-5-simm |diary-2-64k> => 98.754|diary-1-64k> + 41.749|wc-comments-2-64k> + 41.749|wc-comments-1-64k> + 19.107|eztv-2-64k> + 17.242|semantic-1-64k> + 16.052|eztv-1-64k> + 15.236|semantic-2-64k> + 14.854|slashdot-3-64k> + 14.143|slashdot-1-64k> + 14.143|slashdot-2-64k>
drop-6-simm |diary-2-64k> => 98.822|diary-1-64k> + 42.175|wc-comments-2-64k> + 42.175|wc-comments-1-64k> + 19.244|eztv-2-64k> + 17.648|semantic-1-64k> + 16.267|semantic-2-64k> + 16.183|eztv-1-64k> + 15.074|slashdot-3-64k> + 14.312|slashdot-1-64k> + 14.312|slashdot-2-64k>
drop-7-simm |diary-2-64k> => 98.822|diary-1-64k> + 41.181|wc-comments-2-64k> + 41.181|wc-comments-1-64k> + 19.254|eztv-2-64k> + 17.829|semantic-1-64k> + 16.886|semantic-2-64k> + 16.188|eztv-1-64k> + 13.495|slashdot-3-64k> + 13.358|slashdot-1-64k> + 13.358|slashdot-2-64k>
drop-8-simm |diary-2-64k> => 98.809|diary-1-64k> + 40.280|wc-comments-2-64k> + 40.280|wc-comments-1-64k> + 19.289|eztv-2-64k> + 17.823|semantic-1-64k> + 17.309|semantic-2-64k> + 16.171|eztv-1-64k> + 13.715|slashdot-3-64k> + 13.576|slashdot-1-64k> + 13.576|slashdot-2-64k>
drop-9-simm |diary-2-64k> => 98.801|diary-1-64k> + 39.915|wc-comments-2-64k> + 39.915|wc-comments-1-64k> + 19.432|eztv-2-64k> + 18.022|semantic-1-64k> + 17.558|semantic-2-64k> + 16.320|eztv-1-64k> + 13.413|slashdot-3-64k> + 13.242|slashdot-1-64k> + 13.242|slashdot-2-64k>
drop-10-simm |diary-2-64k> => 98.801|diary-1-64k> + 38.315|wc-comments-2-64k> + 38.315|wc-comments-1-64k> + 19.432|eztv-2-64k> + 18.022|semantic-1-64k> + 17.558|semantic-2-64k> + 16.320|eztv-1-64k> + 13.430|slashdot-3-64k> + 13.258|slashdot-1-64k> + 13.258|slashdot-2-64k>

drop-1-simm |wc-comments-1-64k> => 99.533|wc-comments-2-64k> + 39.457|diary-1-64k> + 39.239|diary-2-64k> + 11.935|eztv-2-64k> + 11.509|eztv-1-64k> + 8.821|slashdot-3-64k> + 8.460|slashdot-2-64k> + 8.392|slashdot-1-64k> + 6.658|semantic-1-64k> + 4.863|semantic-2-64k>
drop-2-simm |wc-comments-1-64k> => 99.906|wc-comments-2-64k> + 42.452|diary-2-64k> + 42.446|diary-1-64k> + 16.382|eztv-2-64k> + 15.597|eztv-1-64k> + 11.105|slashdot-3-64k> + 10.221|slashdot-1-64k> + 10.214|slashdot-2-64k> + 8.796|semantic-1-64k> + 7.543|semantic-2-64k>
drop-3-simm |wc-comments-1-64k> => 99.898|wc-comments-2-64k> + 41.866|diary-2-64k> + 41.846|diary-1-64k> + 17.344|eztv-2-64k> + 16.428|eztv-1-64k> + 10.926|slashdot-3-64k> + 10.403|slashdot-1-64k> + 10.400|slashdot-2-64k> + 9.476|semantic-1-64k> + 8.981|semantic-2-64k>
drop-4-simm |wc-comments-1-64k> => 99.895|wc-comments-2-64k> + 41.388|diary-2-64k> + 41.363|diary-1-64k> + 17.221|eztv-2-64k> + 16.300|eztv-1-64k> + 10.940|slashdot-3-64k> + 10.713|slashdot-1-64k> + 10.713|slashdot-2-64k> + 9.717|semantic-1-64k> + 9.532|semantic-2-64k>
drop-5-simm |wc-comments-1-64k> => 99.892|wc-comments-2-64k> + 41.749|diary-2-64k> + 41.721|diary-1-64k> + 17.623|eztv-2-64k> + 16.544|eztv-1-64k> + 11.211|slashdot-3-64k> + 10.972|slashdot-1-64k> + 10.972|slashdot-2-64k> + 10.001|semantic-1-64k> + 9.826|semantic-2-64k>
drop-6-simm |wc-comments-1-64k> => 99.891|wc-comments-2-64k> + 42.199|diary-1-64k> + 42.175|diary-2-64k> + 17.797|eztv-2-64k> + 16.716|eztv-1-64k> + 11.111|slashdot-1-64k> + 11.111|slashdot-2-64k> + 11.086|slashdot-3-64k> + 10.204|semantic-1-64k> + 10.001|semantic-2-64k>
drop-7-simm |wc-comments-1-64k> => 99.889|wc-comments-2-64k> + 41.206|diary-1-64k> + 41.181|diary-2-64k> + 17.521|eztv-2-64k> + 16.378|eztv-1-64k> + 10.408|semantic-1-64k> + 10.224|slashdot-3-64k> + 10.218|slashdot-1-64k> + 10.218|slashdot-2-64k> + 10.171|semantic-2-64k>
drop-8-simm |wc-comments-1-64k> => 99.888|wc-comments-2-64k> + 40.310|diary-1-64k> + 40.280|diary-2-64k> + 17.396|eztv-2-64k> + 16.198|eztv-1-64k> + 9.594|slashdot-3-64k> + 9.589|slashdot-1-64k> + 9.589|slashdot-2-64k> + 9.567|semantic-2-64k> + 9.422|semantic-1-64k>
drop-9-simm |wc-comments-1-64k> => 99.886|wc-comments-2-64k> + 39.936|diary-1-64k> + 39.915|diary-2-64k> + 17.543|eztv-2-64k> + 16.345|eztv-1-64k> + 9.704|slashdot-3-64k> + 9.687|slashdot-1-64k> + 9.687|slashdot-2-64k> + 9.686|semantic-2-64k> + 9.516|semantic-1-64k>
drop-10-simm |wc-comments-1-64k> => 99.884|wc-comments-2-64k> + 38.335|diary-1-64k> + 38.315|diary-2-64k> + 16.841|eztv-2-64k> + 15.643|eztv-1-64k> + 9.890|slashdot-3-64k> + 9.873|slashdot-1-64k> + 9.873|slashdot-2-64k> + 9.866|semantic-2-64k> + 9.696|semantic-1-64k>

drop-1-simm |slashdot-2-64k> => 96.165|slashdot-1-64k> + 77.224|slashdot-3-64k> + 13.544|eztv-1-64k> + 13.343|eztv-2-64k> + 10.872|diary-2-64k> + 10.735|diary-1-64k> + 8.460|wc-comments-2-64k> + 8.460|wc-comments-1-64k> + 7.896|semantic-1-64k> + 5.722|semantic-2-64k>
drop-2-simm |slashdot-2-64k> => 97.959|slashdot-1-64k> + 92.552|slashdot-3-64k> + 20.155|eztv-1-64k> + 19.972|eztv-2-64k> + 12.968|diary-2-64k> + 12.648|diary-1-64k> + 10.787|semantic-1-64k> + 10.214|wc-comments-2-64k> + 10.214|wc-comments-1-64k> + 7.871|semantic-2-64k>
drop-3-simm |slashdot-2-64k> => 99.536|slashdot-1-64k> + 95.007|slashdot-3-64k> + 21.427|eztv-1-64k> + 21.264|eztv-2-64k> + 13.734|diary-2-64k> + 13.396|diary-1-64k> + 12.566|semantic-1-64k> + 10.400|wc-comments-2-64k> + 10.400|wc-comments-1-64k> + 9.328|semantic-2-64k>
drop-4-simm |slashdot-2-64k> => 99.494|slashdot-1-64k> + 95.704|slashdot-3-64k> + 21.806|eztv-1-64k> + 21.646|eztv-2-64k> + 14.049|diary-2-64k> + 13.743|semantic-1-64k> + 13.707|diary-1-64k> + 10.713|wc-comments-2-64k> + 10.713|wc-comments-1-64k> + 10.004|semantic-2-64k>
drop-5-simm |slashdot-2-64k> => 99.943|slashdot-1-64k> + 96.441|slashdot-3-64k> + 21.984|eztv-1-64k> + 21.836|eztv-2-64k> + 14.459|semantic-1-64k> + 14.143|diary-2-64k> + 13.792|diary-1-64k> + 10.972|wc-comments-2-64k> + 10.972|wc-comments-1-64k> + 10.720|semantic-2-64k>
drop-6-simm |slashdot-2-64k> => 99.942|slashdot-1-64k> + 97.030|slashdot-3-64k> + 22.407|eztv-1-64k> + 22.259|eztv-2-64k> + 15.789|semantic-1-64k> + 14.312|diary-2-64k> + 13.991|diary-1-64k> + 11.561|semantic-2-64k> + 11.111|wc-comments-2-64k> + 11.111|wc-comments-1-64k>
drop-7-simm |slashdot-2-64k> => 99.940|slashdot-1-64k> + 97.909|slashdot-3-64k> + 22.448|eztv-1-64k> + 22.299|eztv-2-64k> + 16.506|semantic-1-64k> + 13.358|diary-2-64k> + 13.037|diary-1-64k> + 11.771|semantic-2-64k> + 10.218|wc-comments-2-64k> + 10.218|wc-comments-1-64k>
drop-8-simm |slashdot-2-64k> => 99.939|slashdot-1-64k> + 97.882|slashdot-3-64k> + 22.730|eztv-1-64k> + 22.579|eztv-2-64k> + 17.312|semantic-1-64k> + 13.576|diary-2-64k> + 13.252|diary-1-64k> + 12.191|semantic-2-64k> + 9.589|wc-comments-2-64k> + 9.589|wc-comments-1-64k>
drop-9-simm |slashdot-2-64k> => 99.938|slashdot-1-64k> + 97.884|slashdot-3-64k> + 23.161|eztv-1-64k> + 23.005|eztv-2-64k> + 17.642|semantic-1-64k> + 13.242|diary-2-64k> + 12.908|diary-1-64k> + 12.653|semantic-2-64k> + 9.687|wc-comments-2-64k> + 9.687|wc-comments-1-64k>
drop-10-simm |slashdot-2-64k> => 99.937|slashdot-1-64k> + 97.872|slashdot-3-64k> + 23.283|eztv-1-64k> + 23.128|eztv-2-64k> + 17.652|semantic-1-64k> + 13.258|diary-2-64k> + 12.924|diary-1-64k> + 12.663|semantic-2-64k> + 9.873|wc-comments-2-64k> + 9.873|wc-comments-1-64k>

drop-1-simm |semantic-1-64k> => 65.846|semantic-2-64k> + 10.374|diary-1-64k> + 10.305|diary-2-64k> + 8.165|slashdot-3-64k> + 7.896|slashdot-2-64k> + 7.855|slashdot-1-64k> + 7.677|eztv-1-64k> + 7.528|eztv-2-64k> + 6.678|wc-comments-2-64k> + 6.658|wc-comments-1-64k>
drop-2-simm |semantic-1-64k> => 73.858|semantic-2-64k> + 15.118|diary-2-64k> + 14.835|diary-1-64k> + 10.961|slashdot-3-64k> + 10.793|slashdot-1-64k> + 10.787|slashdot-2-64k> + 9.051|eztv-1-64k> + 9.051|eztv-2-64k> + 8.796|wc-comments-2-64k> + 8.796|wc-comments-1-64k>
drop-3-simm |semantic-1-64k> => 77.788|semantic-2-64k> + 16.499|diary-2-64k> + 16.206|diary-1-64k> + 12.626|slashdot-3-64k> + 12.569|slashdot-1-64k> + 12.566|slashdot-2-64k> + 10.699|eztv-1-64k> + 10.699|eztv-2-64k> + 9.476|wc-comments-2-64k> + 9.476|wc-comments-1-64k>
drop-4-simm |semantic-1-64k> => 77.417|semantic-2-64k> + 16.951|diary-2-64k> + 16.572|diary-1-64k> + 13.807|slashdot-3-64k> + 13.743|slashdot-1-64k> + 13.743|slashdot-2-64k> + 11.832|eztv-1-64k> + 11.832|eztv-2-64k> + 9.717|wc-comments-2-64k> + 9.717|wc-comments-1-64k>
drop-5-simm |semantic-1-64k> => 79.583|semantic-2-64k> + 17.242|diary-2-64k> + 16.819|diary-1-64k> + 14.532|slashdot-3-64k> + 14.459|slashdot-1-64k> + 14.459|slashdot-2-64k> + 12.518|eztv-1-64k> + 12.518|eztv-2-64k> + 10.001|wc-comments-2-64k> + 10.001|wc-comments-1-64k>
drop-6-simm |semantic-1-64k> => 79.927|semantic-2-64k> + 17.648|diary-2-64k> + 17.165|diary-1-64k> + 15.880|slashdot-3-64k> + 15.789|slashdot-1-64k> + 15.789|slashdot-2-64k> + 13.807|eztv-1-64k> + 13.807|eztv-2-64k> + 10.204|wc-comments-2-64k> + 10.204|wc-comments-1-64k>
drop-7-simm |semantic-1-64k> => 79.571|semantic-2-64k> + 17.829|diary-2-64k> + 17.248|diary-1-64k> + 16.636|slashdot-3-64k> + 16.506|slashdot-1-64k> + 16.506|slashdot-2-64k> + 14.815|eztv-1-64k> + 14.815|eztv-2-64k> + 10.408|wc-comments-2-64k> + 10.408|wc-comments-1-64k>
drop-8-simm |semantic-1-64k> => 79.337|semantic-2-64k> + 17.823|diary-2-64k> + 17.445|slashdot-3-64k> + 17.312|slashdot-1-64k> + 17.312|slashdot-2-64k> + 17.186|diary-1-64k> + 15.600|eztv-1-64k> + 15.600|eztv-2-64k> + 9.422|wc-comments-2-64k> + 9.422|wc-comments-1-64k>
drop-9-simm |semantic-1-64k> => 80.936|semantic-2-64k> + 18.022|diary-2-64k> + 17.797|slashdot-3-64k> + 17.642|slashdot-1-64k> + 17.642|slashdot-2-64k> + 17.367|diary-1-64k> + 15.896|eztv-1-64k> + 15.896|eztv-2-64k> + 9.516|wc-comments-2-64k> + 9.516|wc-comments-1-64k>
drop-10-simm |semantic-1-64k> => 80.936|semantic-2-64k> + 18.022|diary-2-64k> + 17.808|slashdot-3-64k> + 17.652|slashdot-1-64k> + 17.652|slashdot-2-64k> + 17.367|diary-1-64k> + 15.896|eztv-1-64k> + 15.896|eztv-2-64k> + 9.696|wc-comments-2-64k> + 9.696|wc-comments-1-64k>

OK. Some pretty good results there! I wonder if upscaling to 1M buckets as I have already done in the ebook case, if that would improve the results? I suspect not.
OK. I tried a "meta-simm", and I wasn't sure if we would get better or worse results. After running into a bug (I had to delete the drop-4-simm |*> rule to fix) I got this:
(where by "meta-simm" I mean apply simm on the results of a previous simm)

sa: meta-simm |*> #=> 100 similar[drop-4-simm] |_self>
sa: meta-simm |slashdot-1-64k>
68.117|slashdot-3-64k> + 68.039|slashdot-2-64k> + 48.623|eztv-1-64k> + 48.024|eztv-2-64k> + 45.224|semantic-1-64k> + 41.308|semantic-2-64k> + 40.832|diary-2-64k> + 40.647|diary-1-64k> + 39.798|wc-comments-2-64k> + 39.798|wc-comments-1-64k>

So terrible results. No need to test other kets, or drop-n's. But an interesting fact nonetheless (ie, meta-simm's give worse results).

OK. Took some futzing about, but I converted the drop-6-simm results into a "similarity matrix". (There has to be a less labour intensive way of producing these!)

[ diary-1-64k       ]   [ 100.00  98.822  15.917  18.978  17.165  16.338  13.991  13.991  14.752  42.199  42.199 ] [ diary-1-64k       ]
[ diary-2-64k       ]   [ 98.822  100.00  16.183  19.244  17.648  16.267  14.312  14.312  15.074  42.175  42.175 ] [ diary-2-64k       ]
[ eztv-1-64k        ]   [ 15.917  16.183  100.00  91.816  13.807  9.580   22.349  22.407  23.304  16.716  16.716 ] [ eztv-1-64k        ]
[ eztv-2-64k        ]   [ 18.978  19.244  91.816  100.00  13.807  9.580   22.201  22.259  23.156  17.797  17.797 ] [ eztv-2-64k        ]
[ semantic-1-64k    ]   [ 17.165  17.648  13.807  13.807  100.00  79.927  15.789  15.789  15.880  10.204  10.204 ] [ semantic-1-64k    ]
[ semantic-2-64k    ] = [ 16.338  16.267  9.580   9.580   79.927  100.00  11.561  11.561  11.652  10.001  10.001 ] [ semantic-2-64k    ]
[ slashdot-1-64k    ]   [ 13.991  14.312  22.349  22.201  15.789  11.561  100.00  99.942  96.971  11.111  11.111 ] [ slashdot-1-64k    ]
[ slashdot-2-64k    ]   [ 13.991  14.312  22.407  22.259  15.789  11.561  99.942  100.00  97.030  11.111  11.111 ] [ slashdot-2-64k    ]
[ slashdot-3-64k    ]   [ 14.752  15.074  23.304  23.156  15.880  11.652  96.971  97.030  100.00  11.086  11.086 ] [ slashdot-3-64k    ]
[ wc-comments-1-64k ]   [ 42.199  42.175  16.716  17.797  10.204  10.001  11.111  11.111  11.086  100.00  99.891 ] [ wc-comments-1-64k ]
[ wc-comments-2-64k ]   [ 42.199  42.175  16.716  17.797  10.204  10.001  11.111  11.111  11.086  99.891  100.00 ] [ wc-comments-2-64k ]

20/5/2014: WOOT!! I finally have code to convert BKO rules to matrices. This is going to be useful!
(so what we are doing is really a mapping back from BKO to a simple MatSumSig model)
Note that I like to swap back and forth between different representations. What is difficult in one, is often easy in the other, and vice versa.
For example, matrices are very nice for visualizing a lot of data all at once. Much easier than the equivalent BKO. But I find BKO is far more practical to work with.
First, the code to pretty print columns of data:

# these two functions help to pretty-print tables, and matrices in particular:
def normalize_column_return_list(s,n):
  lines = (s.split('\n') + ['']*n)[:n]
  max_len = max(len(x) for x in lines)
  return [x.ljust(max_len) for x in lines]

def paste_columns(data,pre='',sep=' ',post=''):
  if len(data) == 0:
    return ""
  columns = len(data)
  rows = max(s.count('\n') + 1 for s in data)
  r = [normalize_column_return_list(s,rows) for s in data]
  return "\n".join(pre + sep.join(r[j][k] for j in range(columns)) + post for k in range(rows))
  
eg: 

col_1 = "3\n9\n217\n13"
col_2 = "0\n5\n-3\n-513"
col_3 = "3.1415\n2.17\n1.23\n-6"

matrix = paste_columns([col_1,col_2,col_3],'[ ',' ',' ]')
print(matrix)
spits out:
[ 3   0    3.1415 ]
[ 9   5    2.17   ]
[ 217 -3   1.23   ]
[ 13  -513 -6     ]

And then this, on the messy side, code:

def sp_to_vect(one):
  if one.count() <= 1:
    vect = one.the_label()
  else:                                        
    vect = "\n".join(x.label for x in one.data)
  return paste_columns([vect],'[ ','',' ]')

def sp_to_list(one):  
  if one.count() <= 1:
    return one.the_label()
  return "\n".join(x.label for x in one.data)

# make 0.000 coeffs prettier!
def coeff_to_str(x):
  if x == 0:
    return "0"
  else:
    return str("%.3f" % x)                                         # this means if we want to change precission, we only need to change it here.
    
def sp_coeffs_to_list(one):                                        
  if one.count() <= 1:
    return coeff_to_str(one.the_value())
  return "\n".join(coeff_to_str(x.value) for x in one.data)

# code to spit out a pretty printed matrix given BKO rules:
def matrix(context,op):
  one = context.relevant_kets(op).ket_sort()       # one is the list of kets that will be on the right hand side.
                                                   # usefully, relevant_kets() always returns a superposition.
  if one.count() == 0:                             # if one is empty, return the identity ket.
    return ket("",0)
                                                  
  two = superposition()                            # two is the list of kets that will be on the left hand side.
  for elt in one.data:
    sp = elt.apply_op(context,op)
    two = union(two,sp)
  two = two.ket_sort()

  empty = two.multiply(0)                           # empty is the two list, with all coeffs set to 0
  
  matrix_columns = []                                   # convert to list-comprehension?
  for elt in one.data:
    sp = (elt.apply_op(context,op) + empty).ket_sort()  # we add "empty" so the column has all the elements.
    matrix_columns.append(sp_coeffs_to_list(sp))

  x = sp_to_vect(one)
  y = sp_to_vect(two)
  M = paste_columns(matrix_columns,'[  ','  ','  ]')
  matrix = paste_columns([y,'=',M,x])    
  print(matrix)
  #print("\n" + paste_columns(matrix_columns,'',' ',''))
  return ket("matrix")                              # Just here so it retuns a ket of some sort. Has no meaning, really.

And now in the console:

$ ./the_semantic_db_console.py
Welcome!

sa: load matrix-example-2.sw
sa: dump
----------------------------------------
|context> => |context: 2 matrix play>

M1 |x1> => 3.000|y2>
M1 |x2> => 7.000|y1> + 6.000|y2>
M1 |x3> => |y1> + 4.000|y2>
M1 |x4> => |y1>
M1 |x5> => 6.000|y1> + 4.000|y2>
M1 |x6> => 4.000|y1> + 8.000|y2>
M1 |x7> => |y1> + 2.000|y2>

M2 |y1> => 6.000|z1> + 2.000|z2> + 7.000|z3> + 9.000|z4> + 5.000|z5>
M2 |y2> => 3.000|z2> + 4.000|z3> + |z5>
----------------------------------------

sa: -- what happens if we try an unknown operator, we get the empty ket:
sa: matrix[fish]
0.000|>

sa: -- now, the M1 matrix:
sa: matrix[M1]
[ y1 ] = [  0      7.000  1.000  1.000  6.000  4.000  1.000  ] [ x1 ]
[ y2 ]   [  3.000  6.000  4.000  0      4.000  8.000  2.000  ] [ x2 ]
                                                               [ x3 ]
                                                               [ x4 ]
                                                               [ x5 ]
                                                               [ x6 ]
                                                               [ x7 ]
|matrix>

sa: -- now, the M2 matrix:
sa: matrix[M2]
[ z1 ] = [  6.000  0      ] [ y1 ]
[ z2 ]   [  2.000  3.000  ] [ y2 ]
[ z3 ]   [  7.000  4.000  ]
[ z4 ]   [  9.000  0      ]
[ z5 ]   [  5.000  1.000  ]
|matrix>

sa: -- now another data set:
sa: load fred-sam-friends.sw
sa: dump
----------------------------------------
|context> => |context: friends>

friends |Fred> => |Jack> + |Harry> + |Ed> + |Mary> + |Rob> + |Patrick> + |Emma> + |Charlie>
friends |Sam> => |Charlie> + |George> + |Emma> + |Jack> + |Rober> + |Frank> + |Julie>
----------------------------------------

sa: matrix[friends]
[ Charlie ] = [  1.000  1.000  ] [ Fred ]
[ Ed      ]   [  1.000  0      ] [ Sam  ]
[ Emma    ]   [  1.000  1.000  ]
[ Frank   ]   [  0      1.000  ]
[ George  ]   [  0      1.000  ]
[ Harry   ]   [  1.000  0      ]
[ Jack    ]   [  1.000  1.000  ]
[ Julie   ]   [  0      1.000  ]
[ Mary    ]   [  1.000  0      ]
[ Patrick ]   [  1.000  0      ]
[ Rob     ]   [  1.000  0      ]
[ Rober   ]   [  0      1.000  ]
|matrix>

sa: -- now another data set:
sa: load bots.sw
sa: matrix[name]
[ Bella   ] = [  1.000  0      0      ] [ bot: Bella   ]
[ Emma    ]   [  0      1.000  0      ] [ bot: Emma    ]
[ Madison ]   [  0      0      1.000  ] [ bot: Madison ]
|matrix>

sa: matrix[age]
[ age: 23 ] = [  0      0      1.000  ] [ bot: Bella   ]
[ age: 29 ]   [  0      1.000  0      ] [ bot: Emma    ]
[ age: 31 ]   [  1.000  0      0      ] [ bot: Madison ]
|matrix>

sa: matrix[religion]
[ religion: Christianity ] = [  1.000  0      0      ] [ bot: Bella   ]
[ religion: Islam        ]   [  0      0      1.000  ] [ bot: Emma    ]
[ religion: Taoism       ]   [  0      1.000  0      ] [ bot: Madison ]
|matrix>

sa: matrix[make-of-car]
[ car: BMW     ] = [  0      1.000  0      ] [ bot: Bella   ]
[ car: Bugatti ]   [  0      0      1.000  ] [ bot: Emma    ]
[ car: Porsche ]   [  1.000  0      0      ] [ bot: Madison ]
|matrix>

sa: matrix[mother]
[ Madison ] = [  0      1.000  0      ] [ bot: Bella   ]
[ Mia     ]   [  1.000  0      1.000  ] [ bot: Emma    ]
                                        [ bot: Madison ]
|matrix>

sa: matrix[father]
[ Ian     ] = [  0      0      1.000  ] [ bot: Bella   ]
[ Nathan  ]   [  0      1.000  0      ] [ bot: Emma    ]
[ William ]   [  1.000  0      0      ] [ bot: Madison ]
|matrix>

sa: matrix[bed-time]
[ time: 10:30pm ] = [  0      0      1.000  ] [ bot: Bella   ]
[ time: 2am     ]   [  0      1.000  0      ] [ bot: Emma    ]
[ time: 8pm     ]   [  1.000  0      0      ] [ bot: Madison ]
|matrix>

sa: -- now another data set:
sa: load in-my-league.sw
sa: matrix[features]
[ athletic  ] = [  0      1.000  0      0      0      0      0      0      ] [ Donna            ]
[ beautiful ]   [  1.000  1.000  0      0      0      0      1.000  1.000  ] [ Emma             ]
[ educated  ]   [  1.000  0      0      1.000  0      1.000  1.000  1.000  ] [ Jane             ]
[ loving    ]   [  0      0      0      1.000  1.000  1.000  1.000  1.000  ] [ Liz              ]
[ religious ]   [  0      1.000  0      0      0      0      0      0      ] [ Mary             ]
[ sexy      ]   [  1.000  1.000  1.000  0      0      0      1.000  1.000  ] [ Mia              ]
[ skinny    ]   [  1.000  1.000  1.000  0      1.000  1.000  1.000  1.000  ] [ my perfect woman ]
[ smart     ]   [  1.000  0      0      1.000  0      1.000  1.000  1.000  ] [ the goddess      ]
|matrix>

sa: -- now another data set:
sa: load shopping-basket.sw
sa: matrix[basket]
[ apple     ] = [  3.000  0      4.000  0      0      ] [ f      ]
[ bananas   ]   [  0      1.000  0      0      0      ] [ user 1 ]
[ bread     ]   [  1.000  1.000  0      0      1.000  ] [ user 2 ]
[ carrots   ]   [  0      1.000  0      0      0      ] [ user 3 ]
[ cheese    ]   [  0      0      0      1.000  1.000  ] [ user 4 ]
[ chocolate ]   [  0      1.000  0      1.000  0      ]
[ coffee    ]   [  1.000  0      1.000  0      0      ]
[ milk      ]   [  1.000  1.000  1.000  0      0      ]
[ olive oil ]   [  0      0      0      1.000  0      ]
[ oranges   ]   [  5.000  0      0      0      0      ]
[ pizza     ]   [  0      0      0      1.000  0      ]
[ salami    ]   [  0      0      0      0      1.000  ]
[ steak     ]   [  1.000  0      1.000  0      0      ]
[ tea       ]   [  0      1.000  0      0      0      ]
[ vegemite  ]   [  0      0      0      1.000  1.000  ]
|matrix>

sa: -- now another data set:
sa: load breakfast-menu.sw
sa: matrix[price]
[ price: 4.50 ] = [  0      0      1.000  0      0      ] [ food: Belgian Waffles             ]
[ price: 5.95 ]   [  1.000  0      0      0      0      ] [ food: Berry-Berry Belgian Waffles ]
[ price: 6.95 ]   [  0      0      0      1.000  0      ] [ food: French Toast                ]
[ price: 7.95 ]   [  0      0      0      0      1.000  ] [ food: Homestyle Breakfast         ]
[ price: 8.95 ]   [  0      1.000  0      0      0      ] [ food: Strawberry Belgian Waffles  ]
|matrix>

sa: matrix[calories]
[ calories: 600 ] = [  0      0      1.000  0      0      ] [ food: Belgian Waffles             ]
[ calories: 650 ]   [  1.000  0      0      0      0      ] [ food: Berry-Berry Belgian Waffles ]
[ calories: 900 ]   [  0      1.000  0      0      1.000  ] [ food: French Toast                ]
[ calories: 950 ]   [  0      0      0      1.000  0      ] [ food: Homestyle Breakfast         ]
                                                            [ food: Strawberry Belgian Waffles  ]
|matrix>

sa: matrix[name]
[ text: "Belgian Waffles"             ] = [  1.000  0      0      0      0      ] [ food: Belgian Waffles             ]
[ text: "Berry-Berry Belgian Waffles" ]   [  0      1.000  0      0      0      ] [ food: Berry-Berry Belgian Waffles ]
[ text: "French Toast"                ]   [  0      0      1.000  0      0      ] [ food: French Toast                ]
[ text: "Homestyle Breakfast"         ]   [  0      0      0      1.000  0      ] [ food: Homestyle Breakfast         ]
[ text: "Strawberry Belgian Waffles"  ]   [  0      0      0      0      1.000  ] [ food: Strawberry Belgian Waffles  ]
|matrix>

sa: matrix[description]
[ text: "Light Belgian waffles covered with an assortment of fresh berries and whipped cream" ] = [  0      1.000  0      0      0      ] [ food: Belgian Waffles             ]
[ text: "Light Belgian waffles covered with strawberries and whipped cream"                   ]   [  0      0      0      0      1.000  ] [ food: Berry-Berry Belgian Waffles ]
[ text: "Thick slices made from our homemade sourdough bread"                                 ]   [  0      0      1.000  0      0      ] [ food: French Toast                ]
[ text: "Two eggs, bacon or sausage, toast, and our ever-popular hash browns"                 ]   [  0      0      0      1.000  0      ] [ food: Homestyle Breakfast         ]
[ text: "Two of our famous Belgian Waffles with plenty of real maple syrup"                   ]   [  1.000  0      0      0      0      ] [ food: Strawberry Belgian Waffles  ]
|matrix>

sa: -- now for a real world data set:
sa: load fragment-documents-64k--post-processing--saved--cleaned.sw
sa: matrix[drop-10-hash]
[ 0651 ] = [  0        0        0         0         16.000   17.000   0        0        0        0       0       ] [ diary-1-64k       ]
[ 08fa ]   [  0        0        0         0         0        0        15.000   15.000   15.000   0       0       ] [ diary-2-64k       ]
[ 09a6 ]   [  0        0        50.000    50.000    0        0        0        0        0        0       0       ] [ eztv-1-64k        ]
[ 0b57 ]   [  12.000   12.000   0         0         0        0        0        0        0        12.000  12.000  ] [ eztv-2-64k        ]
[ 0b6f ]   [  0        0        0         0         0        0        15.000   15.000   15.000   0       0       ] [ semantic-1-64k    ]
[ 0be8 ]   [  0        0        0         0         0        0        15.000   15.000   15.000   0       0       ] [ semantic-2-64k    ]
[ 0c6d ]   [  0        0        0         0         16.000   16.000   0        0        0        0       0       ] [ slashdot-1-64k    ]
[ 0e76 ]   [  0        0        0         0         257.000  701.000  0        0        0        0       0       ] [ slashdot-2-64k    ]
[ 0fa0 ]   [  0        0        0         0         0        0        23.000   23.000   23.000   0       0       ] [ slashdot-3-64k    ]
[ 141b ]   [  0        0        0         0         14.000   25.000   0        0        0        0       0       ] [ wc-comments-1-64k ]
[ 1466 ]   [  0        0        0         0         0        0        19.000   19.000   19.000   0       0       ] [ wc-comments-2-64k ]
[ 16c6 ]   [  0        0        34.000    34.000    0        0        0        0        0        0       0       ]
[ 176a ]   [  23.000   23.000   0         0         0        0        0        0        0        0       0       ]
[ 1853 ]   [  0        0        0         0         0        10.000   0        0        0        0       0       ]
[ 18a5 ]   [  0        0        0         0         0        0        16.000   16.000   16.000   0       0       ]
[ 1a42 ]   [  0        0        13.000    13.000    0        0        0        0        0        0       0       ]
[ 1b1b ]   [  0        0        0         0         0        0        12.000   12.000   12.000   0       0       ]
[ 1e4c ]   [  0        0        0         0         0        0        11.000   11.000   11.000   0       0       ]
[ 1fac ]   [  73.000   73.000   0         0         0        0        0        0        0        17.000  17.000  ]
[ 223a ]   [  28.000   29.000   0         0         0        0        0        0        0        0       0       ]
[ 2353 ]   [  10.000   10.000   0         0         0        0        0        0        0        0       0       ]
[ 2454 ]   [  0        0        0         0         0        0        17.000   17.000   17.000   0       0       ]
[ 25db ]   [  23.000   23.000   0         0         0        0        0        0        0        0       0       ]
[ 27d8 ]   [  17.000   17.000   0         0         0        12.000   0        0        0        13.000  13.000  ]
[ 28ab ]   [  0        0        0         0         0        0        0        0        0        10.000  11.000  ]
[ 2a7a ]   [  0        0        0         0         0        0        10.000   10.000   10.000   0       0       ]
[ 2c50 ]   [  0        0        0         0         12.000   24.000   0        0        0        0       0       ]
[ 2d14 ]   [  0        0        0         0         0        0        26.000   26.000   26.000   0       0       ]
[ 2d55 ]   [  0        0        17.000    17.000    0        0        0        0        0        0       0       ]
[ 2d5d ]   [  10.000   10.000   0         0         0        0        0        0        0        0       0       ]
[ 2f08 ]   [  0        0        0         0         0        0        24.000   24.000   24.000   0       0       ]
[ 2fc8 ]   [  0        0        0         0         0        0        15.000   15.000   15.000   0       0       ]
[ 364d ]   [  0        0        23.000    23.000    0        0        144.000  144.000  144.000  0       0       ]
[ 3678 ]   [  12.000   12.000   0         0         0        0        0        0        0        0       0       ]
[ 3695 ]   [  0        0        0         0         0        0        0        0        0        30.000  30.000  ]
[ 377b ]   [  21.000   21.000   0         0         0        0        0        0        0        0       0       ]
[ 3808 ]   [  18.000   27.000   0         0         26.000   26.000   0        0        0        0       0       ]
[ 38e6 ]   [  0        0        0         33.000    0        0        0        0        0        0       0       ]
[ 3932 ]   [  0        0        57.000    57.000    0        0        0        0        0        0       0       ]
[ 3c81 ]   [  0        0        0         0         0        0        15.000   15.000   15.000   0       0       ]
[ 3eac ]   [  0        0        0         0         0        0        15.000   15.000   0        0       0       ]
[ 3f8b ]   [  10.000   10.000   0         0         0        0        0        0        0        0       0       ]
[ 3fea ]   [  10.000   10.000   0         0         0        0        0        0        0        0       0       ]
[ 410c ]   [  0        0        0         0         0        0        0        0        0        16.000  15.000  ]
[ 42ef ]   [  10.000   10.000   0         0         0        0        0        0        0        0       0       ]
[ 4325 ]   [  0        0        0         0         0        11.000   0        0        0        0       0       ]
[ 4342 ]   [  0        0        0         0         0        0        16.000   16.000   15.000   0       0       ]
[ 4576 ]   [  0        0        0         0         0        12.000   0        0        0        0       0       ]
[ 4a13 ]   [  0        0        0         0         0        0        15.000   15.000   0        0       0       ]
[ 4c07 ]   [  0        0        0         0         0        12.000   0        0        0        0       0       ]
[ 4d14 ]   [  0        0        0         0         0        13.000   0        0        0        0       0       ]
[ 4d2d ]   [  73.000   73.000   0         0         330.000  406.000  28.000   28.000   30.000   0       0       ]
[ 5006 ]   [  31.000   32.000   10.000    40.000    0        0        0        0        0        0       0       ]
[ 507e ]   [  14.000   14.000   11.000    11.000    0        0        0        0        0        0       0       ]
[ 53ec ]   [  13.000   13.000   0         0         0        0        0        0        0        67.000  67.000  ]
[ 5439 ]   [  0        0        0         0         116.000  130.000  0        0        0        0       0       ]
[ 57ed ]   [  0        0        0         0         0        0        22.000   22.000   22.000   0       0       ]
[ 5a93 ]   [  0        0        0         0         0        0        14.000   14.000   14.000   0       0       ]
[ 5fad ]   [  0        0        0         0         0        0        16.000   16.000   15.000   0       0       ]
[ 6231 ]   [  0        0        0         0         0        0        15.000   15.000   15.000   0       0       ]
[ 62dc ]   [  0        0        0         0         74.000   99.000   0        0        0        0       0       ]
[ 63fe ]   [  0        0        0         33.000    0        0        0        0        0        0       0       ]
[ 6541 ]   [  11.000   11.000   0         0         0        0        0        0        0        11.000  11.000  ]
[ 69bd ]   [  0        0        0         0         0        0        33.000   33.000   33.000   0       0       ]
[ 6be1 ]   [  10.000   10.000   0         0         0        0        0        0        0        0       0       ]
[ 6d21 ]   [  0        0        0         0         0        10.000   0        0        0        0       0       ]
[ 6df4 ]   [  0        0        0         0         0        15.000   0        0        0        0       0       ]
[ 6e98 ]   [  0        0        0         0         0        0        0        0        0        30.000  30.000  ]
[ 6f6b ]   [  10.000   10.000   0         0         0        0        0        0        0        0       0       ]
[ 6f7f ]   [  0        0        100.000   100.000   0        0        0        0        0        0       0       ]
[ 70de ]   [  0        0        0         0         0        0        15.000   15.000   15.000   0       0       ]
[ 7549 ]   [  0        0        0         0         0        0        15.000   15.000   15.000   0       0       ]
[ 75de ]   [  38.000   38.000   0         0         0        0        0        0        0        0       0       ]
[ 764d ]   [  0        0        0         0         18.000   31.000   0        0        0        0       0       ]
[ 7718 ]   [  0        0        0         0         0        0        0        0        0        11.000  11.000  ]
[ 78ca ]   [  0        0        0         0         0        0        15.000   15.000   13.000   0       0       ]
[ 7bf0 ]   [  0        0        14.000    14.000    0        0        0        0        0        0       0       ]
[ 7cef ]   [  0        0        0         0         0        0        15.000   15.000   15.000   0       0       ]
[ 7d75 ]   [  0        0        0         0         0        0        19.000   18.000   18.000   0       0       ]
[ 8071 ]   [  0        0        0         0         52.000   52.000   0        0        0        0       0       ]
[ 8097 ]   [  10.000   10.000   0         0         0        0        0        0        0        0       0       ]
[ 8365 ]   [  10.000   10.000   0         0         0        0        0        0        0        0       0       ]
[ 854e ]   [  49.000   49.000   294.000   324.000   0        0        17.000   17.000   17.000   40.000  40.000  ]
[ 8731 ]   [  12.000   12.000   0         0         0        0        0        0        0        0       0       ]
[ 8a83 ]   [  0        0        0         0         0        0        15.000   15.000   15.000   0       0       ]
[ 8b81 ]   [  0        0        13.000    13.000    0        0        0        0        0        0       0       ]
[ 8e4d ]   [  10.000   10.000   0         0         0        0        0        0        0        0       0       ]
[ 8f98 ]   [  30.000   31.000   10.000    40.000    0        0        0        0        0        0       0       ]
[ 9183 ]   [  53.000   53.000   0         34.000    0        0        0        0        0        19.000  19.000  ]
[ 9197 ]   [  0        0        0         0         0        0        15.000   15.000   15.000   0       0       ]
[ 924b ]   [  24.000   24.000   0         0         0        0        0        0        0        0       0       ]
[ 92b7 ]   [  0        0        0         0         0        0        15.000   15.000   15.000   0       0       ]
[ 93d9 ]   [  0        0        0         0         0        0        0        0        0        30.000  30.000  ]
[ 94aa ]   [  10.000   10.000   0         0         0        0        0        0        0        0       0       ]
[ 9513 ]   [  19.000   21.000   0         0         0        0        0        0        0        0       0       ]
[ 95b7 ]   [  0        0        0         0         12.000   12.000   0        0        0        0       0       ]
[ 972b ]   [  0        0        50.000    50.000    0        0        0        0        0        0       0       ]
[ 9759 ]   [  38.000   40.000   0         0         15.000   15.000   0        0        0        32.000  32.000  ]
[ 9ed4 ]   [  0        0        0         0         0        0        26.000   26.000   25.000   0       0       ]
[ 9ef1 ]   [  0        0        0         38.000    0        0        0        0        0        0       0       ]
[ a379 ]   [  0        0        12.000    12.000    0        0        251.000  251.000  251.000  0       0       ]
[ a48a ]   [  59.000   60.000   0         0         0        0        0        0        0        92.000  92.000  ]
[ a4c1 ]   [  0        0        0         0         21.000   27.000   0        0        0        0       0       ]
[ a4e8 ]   [  0        0        0         0         0        0        0        0        0        13.000  13.000  ]
[ a7c0 ]   [  10.000   10.000   0         0         0        0        0        0        0        0       0       ]
[ a8c9 ]   [  0        0        0         0         0        0        0        0        0        30.000  30.000  ]
[ ab87 ]   [  0        0        0         0         0        10.000   0        0        0        0       0       ]
[ acd0 ]   [  128.000  128.000  0         0         0        0        0        0        0        22.000  22.000  ]
[ ae23 ]   [  0        0        0         0         12.000   12.000   0        0        0        0       0       ]
[ ae2a ]   [  10.000   10.000   0         0         12.000   13.000   0        0        0        0       0       ]
[ b095 ]   [  0        0        0         0         0        14.000   0        0        0        0       0       ]
[ b209 ]   [  0        0        0         0         0        24.000   0        0        0        0       0       ]
[ b61c ]   [  0        0        1031.000  1031.000  0        0        0        0        0        19.000  19.000  ]
[ b649 ]   [  10.000   10.000   0         0         0        0        0        0        0        0       0       ]
[ ba48 ]   [  0        0        0         0         0        0        15.000   15.000   15.000   0       0       ]
[ bb1c ]   [  26.000   26.000   0         0         0        0        0        0        0        0       0       ]
[ bb29 ]   [  0        0        0         0         0        17.000   0        0        0        0       0       ]
[ bb2f ]   [  0        0        0         0         11.000   14.000   0        0        0        0       0       ]
[ bb56 ]   [  0        0        0         0         0        0        22.000   22.000   22.000   0       0       ]
[ bd83 ]   [  0        0        0         0         0        0        42.000   42.000   42.000   0       0       ]
[ be4c ]   [  0        0        0         0         0        0        0        0        0        30.000  30.000  ]
[ bf7c ]   [  22.000   22.000   0         0         0        0        0        0        0        0       0       ]
[ c206 ]   [  0        0        50.000    50.000    0        0        0        0        0        0       0       ]
[ c24f ]   [  0        0        0         0         0        0        15.000   15.000   15.000   0       0       ]
[ c395 ]   [  15.000   15.000   0         0         0        0        0        0        0        15.000  15.000  ]
[ c441 ]   [  0        0        0         0         0        0        0        0        0        30.000  30.000  ]
[ c727 ]   [  0        0        0         0         54.000   57.000   0        0        0        0       0       ]
[ c77a ]   [  13.000   13.000   0         0         0        0        0        0        0        0       0       ]
[ c90d ]   [  17.000   17.000   83.000    113.000   0        0        0        0        0        0       0       ]
[ cb1f ]   [  12.000   12.000   0         0         0        0        0        0        0        66.000  66.000  ]
[ d202 ]   [  0        0        0         0         162.000  204.000  0        0        0        0       0       ]
[ d362 ]   [  13.000   13.000   0         0         0        0        0        0        0        13.000  13.000  ]
[ d426 ]   [  14.000   14.000   14.000    14.000    0        0        0        0        0        0       0       ]
[ d43c ]   [  0        0        17.000    47.000    0        0        0        0        0        0       0       ]
[ d691 ]   [  0        0        0         0         0        0        78.000   78.000   78.000   0       0       ]
[ d8d7 ]   [  0        0        0         0         13.000   13.000   0        0        0        0       0       ]
[ ddb9 ]   [  33.000   33.000   0         0         0        0        0        0        0        0       0       ]
[ de0f ]   [  0        0        0         0         0        0        15.000   15.000   15.000   0       0       ]
[ def0 ]   [  0        0        0         0         162.000  204.000  0        0        0        0       0       ]
[ e191 ]   [  10.000   10.000   0         0         0        0        0        0        0        0       0       ]
[ e1a2 ]   [  102.000  102.000  0         0         0        0        0        0        0        20.000  20.000  ]
[ e341 ]   [  74.000   74.000   0         0         0        0        0        0        0        17.000  17.000  ]
[ e3a1 ]   [  59.000   59.000   0         0         0        0        0        0        0        16.000  16.000  ]
[ e3f2 ]   [  0        0        0         0         0        0        15.000   15.000   15.000   0       0       ]
[ e5aa ]   [  160.000  168.000  584.000   614.000   268.000  278.000  331.000  332.000  340.000  76.000  76.000  ]
[ e78f ]   [  0        0        0         0         13.000   13.000   0        0        0        0       0       ]
[ ea98 ]   [  10.000   10.000   0         0         0        0        0        0        0        0       0       ]
[ efac ]   [  0        0        33.000    33.000    0        0        0        0        0        0       0       ]
[ f03c ]   [  0        0        0         0         0        0        15.000   15.000   15.000   0       0       ]
[ f1d9 ]   [  0        0        0         0         0        0        49.000   49.000   49.000   0       0       ]
[ f7b6 ]   [  0        0        0         0         0        0        12.000   12.000   12.000   0       0       ]
[ fa5d ]   [  25.000   26.000   0         0         0        0        0        0        0        66.000  66.000  ]
[ fb84 ]   [  0        0        0         0         0        0        16.000   16.000   16.000   0       0       ]
[ fde9 ]   [  10.000   10.000   0         0         0        0        0        0        0        0       0       ]
[ fdea ]   [  10.000   10.000   0         0         0        0        0        0        0        0       0       ]
|matrix>

Here is the drop-2-hash matrix.

21/5/2014: Improved the code, so now it can spit out matrices more than one layer deep.
eg:

$ ./the_semantic_db_console.py
Welcome!

sa: load matrix-example-2.sw
sa: matrix[M1]
[ y1 ] = [  0      7.000  1.000  1.000  6.000  4.000  1.000  ] [ x1 ]
[ y2 ]   [  3.000  6.000  4.000  0      4.000  8.000  2.000  ] [ x2 ]
                                                               [ x3 ]
                                                               [ x4 ]
                                                               [ x5 ]
                                                               [ x6 ]
                                                               [ x7 ]
|matrix>

sa: matrix[M2]
[ z1 ] = [  6.000  0      ] [ y1 ]
[ z2 ]   [  2.000  3.000  ] [ y2 ]
[ z3 ]   [  7.000  4.000  ]
[ z4 ]   [  9.000  0      ]
[ z5 ]   [  5.000  1.000  ]
|matrix>

sa: matrix[M2,M1]
[ z1 ] = [  6.000  0      ] [  0      7.000  1.000  1.000  6.000  4.000  1.000  ] [ x1 ]
[ z2 ]   [  2.000  3.000  ] [  3.000  6.000  4.000  0      4.000  8.000  2.000  ] [ x2 ]
[ z3 ]   [  7.000  4.000  ]                                                       [ x3 ]
[ z4 ]   [  9.000  0      ]                                                       [ x4 ]
[ z5 ]   [  5.000  1.000  ]                                                       [ x5 ]
                                                                                  [ x6 ]
                                                                                  [ x7 ]
|matrix>

sa: -- even (sort of) works if you get the order wrong:
sa: matrix[M1,M2]
[  ] = [  0  0  0  0  0  ] [  6.000  0      ] [ y1 ]
                           [  2.000  3.000  ] [ y2 ]
                           [  7.000  4.000  ]
                           [  9.000  0      ]
                           [  5.000  1.000  ]
|matrix>

Here's the code:

# code to return a single matrix, and the left-hand superposition:
# one must be a superposition
# op is a literal op
def single_matrix(one,context,op):
  one = one.apply_sigmoid(set_to,1)
  two = superposition()                                 # two is the list of kets that will be on the left hand side.
  for elt in one.data:                                  # heh. using one.data kind of breaks the superposition abstract interface idea.
    sp = elt.apply_op(context,op)
    two = union(two,sp)
  two = two.ket_sort().multiply(0)                      # merged two, and empty into the same thing.
  matrix_columns = [sp_coeffs_to_list((elt.apply_op(context,op) + two).ket_sort()) for elt in one.data ]
  M = paste_columns(matrix_columns,'[  ','  ','  ]')    # M is the matrix
  return two, M
  
# third version. 
# this one I want to handle multiple ops at once, and then chain the matrices.
# eg: matrix[M2,M1]
# or: matrix[friends,friends]  -- ie, matrix of second-order friends
def multi_matrix(context,ops):
  ops = ops.split(',')[::-1]
  print("ops:",ops)
  
  one = context.relevant_kets(ops[0]).ket_sort()   # one is the list of kets that will be on the right hand side.
                                                   # usefully, relevant_kets() always returns a superposition.
  if one.count() == 0:                             # if one is empty, return the identity ket.
    return ket("",0)

  two, M = single_matrix(one,context,ops[0])
  matrices = [M]
  for op in ops[1:]:
    two, M = single_matrix(two,context,op)
    matrices.append(M)
  x = sp_to_vect(one)
  y = sp_to_vect(two)
  line = [y,'='] + matrices[::-1] + [x]
  matrix = paste_columns(line)
  print(matrix)  
  return ket("matrix")

OK. Now some more examples:

sa: load child-parent-binary-tree.sw
sa: matrix[left]
[ 0   ] = [  0      0      0      0      0      0      1.000  ] [ 0  ]
[ 00  ]   [  1.000  0      0      0      0      0      0      ] [ 00 ]
[ 000 ]   [  0      1.000  0      0      0      0      0      ] [ 01 ]
[ 001 ]   [  0      0      1.000  0      0      0      0      ] [ 1  ]
[ 01  ]   [  0      0      0      1.000  0      0      0      ] [ 10 ]
[ 010 ]   [  0      0      0      0      1.000  0      0      ] [ 11 ]
[ 011 ]   [  0      0      0      0      0      1.000  0      ] [ x  ]
|matrix>

sa: matrix[right]
[ 1   ] = [  0      0      0      0      0      0      1.000  ] [ 0  ]
[ 10  ]   [  1.000  0      0      0      0      0      0      ] [ 00 ]
[ 100 ]   [  0      1.000  0      0      0      0      0      ] [ 01 ]
[ 101 ]   [  0      0      1.000  0      0      0      0      ] [ 1  ]
[ 11  ]   [  0      0      0      1.000  0      0      0      ] [ 10 ]
[ 110 ]   [  0      0      0      0      1.000  0      0      ] [ 11 ]
[ 111 ]   [  0      0      0      0      0      1.000  0      ] [ x  ]
|matrix>

sa: matrix[child]
[ 0   ] = [  0  0      0      0      0      0      0      1.000  ] [ *  ]
[ 00  ]   [  0  1.000  0      0      0      0      0      0      ] [ 0  ]
[ 000 ]   [  0  0      1.000  0      0      0      0      0      ] [ 00 ]
[ 001 ]   [  0  0      0      1.000  0      0      0      0      ] [ 01 ]
[ 01  ]   [  0  0      0      0      1.000  0      0      0      ] [ 1  ]
[ 010 ]   [  0  0      0      0      0      1.000  0      0      ] [ 10 ]
[ 011 ]   [  0  0      0      0      0      0      1.000  0      ] [ 11 ]
[ 1   ]   [  0  0      0      0      0      0      0      1.000  ] [ x  ]
[ 10  ]   [  0  1.000  0      0      0      0      0      0      ]
[ 100 ]   [  0  0      1.000  0      0      0      0      0      ]
[ 101 ]   [  0  0      0      1.000  0      0      0      0      ]
[ 11  ]   [  0  0      0      0      1.000  0      0      0      ]
[ 110 ]   [  0  0      0      0      0      1.000  0      0      ]
[ 111 ]   [  0  0      0      0      0      0      1.000  0      ]
|matrix>

sa: matrix[parent]
[ 0  ] = [  0  0      1.000  0      0      0      0      0      0      1.000  0      0      0      0      0      ] [ *   ]
[ 00 ]   [  0  0      0      1.000  0      0      0      0      0      0      1.000  0      0      0      0      ] [ 0   ]
[ 01 ]   [  0  0      0      0      1.000  0      0      0      0      0      0      1.000  0      0      0      ] [ 00  ]
[ 1  ]   [  0  0      0      0      0      1.000  0      0      0      0      0      0      1.000  0      0      ] [ 000 ]
[ 10 ]   [  0  0      0      0      0      0      1.000  0      0      0      0      0      0      1.000  0      ] [ 001 ]
[ 11 ]   [  0  0      0      0      0      0      0      1.000  0      0      0      0      0      0      1.000  ] [ 01  ]
[ x  ]   [  0  1.000  0      0      0      0      0      0      1.000  0      0      0      0      0      0      ] [ 010 ]
                                                                                                                   [ 011 ]
                                                                                                                   [ 1   ]
                                                                                                                   [ 10  ]
                                                                                                                   [ 100 ]
                                                                                                                   [ 101 ]
                                                                                                                   [ 11  ]
                                                                                                                   [ 110 ]
                                                                                                                   [ 111 ]
|matrix>

sa: -- these next two are not super useful, but here they are anyway.
sa: matrix[parent,child]
[ 0  ] = [  0      1.000  0      0      0      0      0      0      1.000  0      0      0      0      0      ] [  0  0      0      0      0      0      0      1.000  ] [ *  ]
[ 00 ]   [  0      0      1.000  0      0      0      0      0      0      1.000  0      0      0      0      ] [  0  1.000  0      0      0      0      0      0      ] [ 0  ]
[ 01 ]   [  0      0      0      1.000  0      0      0      0      0      0      1.000  0      0      0      ] [  0  0      1.000  0      0      0      0      0      ] [ 00 ]
[ 1  ]   [  0      0      0      0      1.000  0      0      0      0      0      0      1.000  0      0      ] [  0  0      0      1.000  0      0      0      0      ] [ 01 ]
[ 10 ]   [  0      0      0      0      0      1.000  0      0      0      0      0      0      1.000  0      ] [  0  0      0      0      1.000  0      0      0      ] [ 1  ]
[ 11 ]   [  0      0      0      0      0      0      1.000  0      0      0      0      0      0      1.000  ] [  0  0      0      0      0      1.000  0      0      ] [ 10 ]
[ x  ]   [  1.000  0      0      0      0      0      0      1.000  0      0      0      0      0      0      ] [  0  0      0      0      0      0      1.000  0      ] [ 11 ]
                                                                                                                [  0  0      0      0      0      0      0      1.000  ] [ x  ]
                                                                                                                [  0  1.000  0      0      0      0      0      0      ]
                                                                                                                [  0  0      1.000  0      0      0      0      0      ]
                                                                                                                [  0  0      0      1.000  0      0      0      0      ]
                                                                                                                [  0  0      0      0      1.000  0      0      0      ]
                                                                                                                [  0  0      0      0      0      1.000  0      0      ]
                                                                                                                [  0  0      0      0      0      0      1.000  0      ]
|matrix>

sa: matrix[child,parent]
[ 0   ] = [  0      0      0      0      0      0      1.000  ] [  0  0      1.000  0      0      0      0      0      0      1.000  0      0      0      0      0      ] [ *   ]
[ 00  ]   [  1.000  0      0      0      0      0      0      ] [  0  0      0      1.000  0      0      0      0      0      0      1.000  0      0      0      0      ] [ 0   ]
[ 000 ]   [  0      1.000  0      0      0      0      0      ] [  0  0      0      0      1.000  0      0      0      0      0      0      1.000  0      0      0      ] [ 00  ]
[ 001 ]   [  0      0      1.000  0      0      0      0      ] [  0  0      0      0      0      1.000  0      0      0      0      0      0      1.000  0      0      ] [ 000 ]
[ 01  ]   [  0      0      0      1.000  0      0      0      ] [  0  0      0      0      0      0      1.000  0      0      0      0      0      0      1.000  0      ] [ 001 ]
[ 010 ]   [  0      0      0      0      1.000  0      0      ] [  0  0      0      0      0      0      0      1.000  0      0      0      0      0      0      1.000  ] [ 01  ]
[ 011 ]   [  0      0      0      0      0      1.000  0      ] [  0  1.000  0      0      0      0      0      0      1.000  0      0      0      0      0      0      ] [ 010 ]
[ 1   ]   [  0      0      0      0      0      0      1.000  ]                                                                                                           [ 011 ]
[ 10  ]   [  1.000  0      0      0      0      0      0      ]                                                                                                           [ 1   ]
[ 100 ]   [  0      1.000  0      0      0      0      0      ]                                                                                                           [ 10  ]
[ 101 ]   [  0      0      1.000  0      0      0      0      ]                                                                                                           [ 100 ]
[ 11  ]   [  0      0      0      1.000  0      0      0      ]                                                                                                           [ 101 ]
[ 110 ]   [  0      0      0      0      1.000  0      0      ]                                                                                                           [ 11  ]
[ 111 ]   [  0      0      0      0      0      1.000  0      ]                                                                                                           [ 110 ]
                                                                                                                                                                          [ 111 ]
|matrix>

OK. With some tweaks, we now have merged-matrix. Instead of matrices next to each other, merge them into the one matrix.

sa: load matrix-example-2.sw
sa: matrix[M1]
[ y1 ] = [  0      7.000  1.000  1.000  6.000  4.000  1.000  ] [ x1 ]
[ y2 ]   [  3.000  6.000  4.000  0      4.000  8.000  2.000  ] [ x2 ]
                                                               [ x3 ]
                                                               [ x4 ]
                                                               [ x5 ]
                                                               [ x6 ]
                                                               [ x7 ]
|matrix>

sa: matrix[M2]
[ z1 ] = [  6.000  0      ] [ y1 ]
[ z2 ]   [  2.000  3.000  ] [ y2 ]
[ z3 ]   [  7.000  4.000  ]
[ z4 ]   [  9.000  0      ]
[ z5 ]   [  5.000  1.000  ]
|matrix>

sa: matrix[M2,M1]
[ z1 ] = [  6.000  0      ] [  0      7.000  1.000  1.000  6.000  4.000  1.000  ] [ x1 ]
[ z2 ]   [  2.000  3.000  ] [  3.000  6.000  4.000  0      4.000  8.000  2.000  ] [ x2 ]
[ z3 ]   [  7.000  4.000  ]                                                       [ x3 ]
[ z4 ]   [  9.000  0      ]                                                       [ x4 ]
[ z5 ]   [  5.000  1.000  ]                                                       [ x5 ]
                                                                                  [ x6 ]
                                                                                  [ x7 ]
|matrix>

sa: merged-matrix[M2,M1]
[ z1 ] = [  0       42.000  6.000   6.000  36.000  24.000  6.000   ] [ x1 ]
[ z2 ]   [  9.000   32.000  14.000  2.000  24.000  32.000  8.000   ] [ x2 ]
[ z3 ]   [  12.000  73.000  23.000  7.000  58.000  60.000  15.000  ] [ x3 ]
[ z4 ]   [  0       63.000  9.000   9.000  54.000  36.000  9.000   ] [ x4 ]
[ z5 ]   [  3.000   41.000  9.000   5.000  34.000  28.000  7.000   ] [ x5 ]
                                                                     [ x6 ]
                                                                     [ x7 ]
|matrix>

sa: merged-matrix[M1,M2]
[  ] = [  0  0  ] [ y1 ]
                  [ y2 ]
|matrix>

sa: load child-parent-binary-tree.sw
sa: merged-matrix[parent,child]       -- parent child |object> == 2 |object> since it is a binary tree.
[ 0  ] = [  0  2.000  0      0      0      0      0      0      ] [ *  ]
[ 00 ]   [  0  0      2.000  0      0      0      0      0      ] [ 0  ]
[ 01 ]   [  0  0      0      2.000  0      0      0      0      ] [ 00 ]
[ 1  ]   [  0  0      0      0      2.000  0      0      0      ] [ 01 ]
[ 10 ]   [  0  0      0      0      0      2.000  0      0      ] [ 1  ]
[ 11 ]   [  0  0      0      0      0      0      2.000  0      ] [ 10 ]
[ x  ]   [  0  0      0      0      0      0      0      2.000  ] [ 11 ]
                                                                  [ x  ]
|matrix>

sa: merged-matrix[child,parent]       -- child parent |object> is the same as sibling |object>
[ 0   ] = [  0  1.000  0      0      0      0      0      0      1.000  0      0      0      0      0      0      ] [ *   ]
[ 00  ]   [  0  0      1.000  0      0      0      0      0      0      1.000  0      0      0      0      0      ] [ 0   ]
[ 000 ]   [  0  0      0      1.000  0      0      0      0      0      0      1.000  0      0      0      0      ] [ 00  ]
[ 001 ]   [  0  0      0      0      1.000  0      0      0      0      0      0      1.000  0      0      0      ] [ 000 ]
[ 01  ]   [  0  0      0      0      0      1.000  0      0      0      0      0      0      1.000  0      0      ] [ 001 ]
[ 010 ]   [  0  0      0      0      0      0      1.000  0      0      0      0      0      0      1.000  0      ] [ 01  ]
[ 011 ]   [  0  0      0      0      0      0      0      1.000  0      0      0      0      0      0      1.000  ] [ 010 ]
[ 1   ]   [  0  1.000  0      0      0      0      0      0      1.000  0      0      0      0      0      0      ] [ 011 ]
[ 10  ]   [  0  0      1.000  0      0      0      0      0      0      1.000  0      0      0      0      0      ] [ 1   ]
[ 100 ]   [  0  0      0      1.000  0      0      0      0      0      0      1.000  0      0      0      0      ] [ 10  ]
[ 101 ]   [  0  0      0      0      1.000  0      0      0      0      0      0      1.000  0      0      0      ] [ 100 ]
[ 11  ]   [  0  0      0      0      0      1.000  0      0      0      0      0      0      1.000  0      0      ] [ 101 ]
[ 110 ]   [  0  0      0      0      0      0      1.000  0      0      0      0      0      0      1.000  0      ] [ 11  ]
[ 111 ]   [  0  0      0      0      0      0      0      1.000  0      0      0      0      0      0      1.000  ] [ 110 ]
                                                                                                                    [ 111 ]
|matrix>

23/5/2014: implemented a map function! I got sick of manually typing up post-processing (with the help of cut/paste). The map function should make this easier.
Also, the current BKO design doesn't allow multi-line programming structures, so we couldn't use a standard for loop.

# 23/5/2014:
# let's implement a map function (since we can't have multi-line for loops, this will have to do!)
# eg: map[op] (|x> + |y>)
# runs:
# op |x> => op |_self>
# op |y> => op |_self>
# ie, it converts function operators (op on the right hand side), in to literal operators (on the left hand side)
# eg: map[fib] (|10> + |11>)
# eg: map[child] (|x> + |0> + |1> + |00> + |01> + |10> + |11>)
# or indirectly:
# map[op] "" |list>
# one is a ket/sp
# op is a string
def map(one,context,op):
  one = superposition() + one    # map kets to superposition. Maybe have a ket/sp function called x.superposition()??
  for x in one.data:             # what if x has x.value != 1? x.apply_op handles that.
    context.learn(op,x,x.apply_op(context,op))
  return ket("map")

Probably not clear what it is doing, so maybe an example at the console will help:

sa: load small-fib.sw
loading sw file: sw-examples/small-fib.sw

sa: dump
----------------------------------------
|context> => |context: sw console>

fib |0> => |0>
fib |1> => |1>

n-1 |*> #=> arithmetic(|_self>,|->,|1>)
n-2 |*> #=> arithmetic(|_self>,|->,|2>)
fib |*> #=> arithmetic( fib n-1 |_self>, |+>, fib n-2 |_self>)      -- fib |*> is a function operator.
----------------------------------------
sa: fib |8> => fib |8>                                              -- do one example by hand.
sa: dump
----------------------------------------
|context> => |context: sw console>

fib |0> => |0>
fib |1> => |1>

n-1 |*> #=> arithmetic(|_self>,|->,|1>)
n-2 |*> #=> arithmetic(|_self>,|->,|2>)
fib |*> #=> arithmetic( fib n-1 |_self>, |+>, fib n-2 |_self>)

fib |8> => |21>                                                     -- NB: this line.
----------------------------------------

sa: map[fib] (|13> + |14> + |15> + |16>)                            -- use our map function
sa: dump
----------------------------------------
|context> => |context: sw console>

fib |0> => |0>
fib |1> => |1>

n-1 |*> #=> arithmetic(|_self>,|->,|1>)
n-2 |*> #=> arithmetic(|_self>,|->,|2>)
fib |*> #=> arithmetic( fib n-1 |_self>, |+>, fib n-2 |_self>)

fib |8> => |21>
fib |13> => |233>                                           -- For |0>,|1>,|8> and now |13>,|14>,|15>,|16> fib is a literal operator.
fib |14> => |377>                                           -- a kind of memoization, I suppose.
fib |15> => |610>
fib |16> => |987>
----------------------------------------

Note, a key part of why this works is literal operators (not sure that is the best name for them) have higher precedence than operators applied to |*>, or |category: *>
See the label-decent code way up above.
eg, an example of "trial labels":

a: b: c: d: fred
a: b: c: d: *
a: b: c: *
a: b: *
a: *
*

So for example, fib |20> will have these two trial labels in context.recall():

20
*

So, if fib |20> is defined, use that, else drop back and try fib |*>. If fib|*> is not defined return |>.
Might as well mention, the key code in context.recall() is:

    match = False
    for trial_label in label_decent(label):
      if trial_label in self.known_kets:
        if op in self.rules[trial_label]:
          rule = self.rules[trial_label][op]
          match = True
          break
    if not match:
      print("recall not found")               
      rule = ket("",0)

Now, if we want to map fn to a list, and have the result called result, we, again, have to do it indirectly.
(It would be nice if we didn't have to do it indirectly, but the current parser prevents that. Specifically, it can't handle compound operators with spaces in their parameters.)
eg: it would be cool if we could do things like:
map[drop-below[75] 100 similar[op],result] "" |list>
But for now we have to do:

result |*> #=> fn |_self>
|list> => |a> + |b> + |c> + |d> + |e>
map[result] "" |list>

In a real programming language this would be something like this:

list = [a,b,c,d,e]
result = map(fn,list)

Here is a worked example in the console:

sa: fn |*> #=> merge-labels(|fn > + |_self>)  -- fn is just some example function
sa: result |*> #=> fn |_self>                 -- we want the results of fn with result as the operator label
sa: |list> => |a> + |b> + |c> + |d> + |e>     -- define our list
sa: dump                                      -- have a look at what we have so far:
----------------------------------------
|context> => |context: sw console>

fn |*> #=> merge-labels(|fn > + |_self>)
result |*> #=> fn |_self>
 |list> => |a> + |b> + |c> + |d> + |e>
----------------------------------------

sa: map[result] "" |list>                     -- apply the map
sa: dump                                      -- have a look at what we have now:
----------------------------------------
|context> => |context: sw console>

fn |*> #=> merge-labels(|fn > + |_self>)
result |*> #=> fn |_self>
 |list> => |a> + |b> + |c> + |d> + |e>

result |a> => |fn a>
result |b> => |fn b>
result |c> => |fn c>
result |d> => |fn d>
result |e> => |fn e>                          
----------------------------------------
sa: matrix[result]                            -- have a look at what we have in matrix form:
[ fn * ] = [  1.00  0     0     0     0     0     ] [ * ]
[ fn a ]   [  0     1.00  0     0     0     0     ] [ a ]
[ fn b ]   [  0     0     1.00  0     0     0     ] [ b ]
[ fn c ]   [  0     0     0     1.00  0     0     ] [ c ]
[ fn d ]   [  0     0     0     0     1.00  0     ] [ d ]
[ fn e ]   [  0     0     0     0     0     1.00  ] [ e ]
|matrix>

25/5/2014 update. OK, improved map so now instead of just map[fn] "" |list>, we can also do: map[fn,result] "" |list>
eg:

sa: fn |*> #=> merge-labels(|fn > + |_self>)
sa: map[fn,destination] (|x> + |y> + |z>)
sa: dump
----------------------------------------
|context> => |context: sw console>
fn |*> #=> merge-labels(|fn > + |_self>)
destination |x> => |fn x>
destination |y> => |fn y>
destination |z> => |fn z>
----------------------------------------
sa: matrix[destination]
[ fn x ] = [  1.00  0     0     ] [ x ]
[ fn y ]   [  0     1.00  0     ] [ y ]
[ fn z ]   [  0     0     1.00  ] [ z ]
|matrix>

Small, but useful improvement!

25/5/2014: Now, a comment on matrices as a visual representation of network structure.
So, I guess the idea is that completely different types of objects can, underneath, contain identical "network structure".
So for a brief example (matrix-as-network.sw):

friends |Alex> => |Jason> + |Ed> + |Mary> + |Liz> + |Beth> + |James> + |nathan>
friends |Bill> => |Jason> + |Beth> + |lena> + |John> + |nathan>
friends |Harry> => |charlie> + |bella> + |sam> + |smithie> + |david> + |nathan>
links-to |url 1> => |url k> + |url g> + |url b> + |url f> + |url l> + |url e> + |url j>
links-to |url 2> => |url h> + |url l> + |url b> + |url g> + |url i>
links-to |url 3> => |url m> + |url a> + |url d> + |url c> + |url n> + |url l>

Right, so at least superficially, the friends network, and the url links-to network, look entirely different, but they actually share the same network structure.
Which becomes clear if we have a look at their respective matrices:

sa: matrix[friends]
[ bella   ] = [  0     0     1.00  ] [ Alex  ]
[ Beth    ]   [  1.00  1.00  0     ] [ Bill  ]
[ charlie ]   [  0     0     1.00  ] [ Harry ]
[ david   ]   [  0     0     1.00  ]
[ Ed      ]   [  1.00  0     0     ]
[ James   ]   [  1.00  0     0     ]
[ Jason   ]   [  1.00  1.00  0     ]
[ John    ]   [  0     1.00  0     ]
[ lena    ]   [  0     1.00  0     ]
[ Liz     ]   [  1.00  0     0     ]
[ Mary    ]   [  1.00  0     0     ]
[ nathan  ]   [  1.00  1.00  1.00  ]
[ sam     ]   [  0     0     1.00  ]
[ smithie ]   [  0     0     1.00  ]
|matrix>

sa: matrix[links-to]
[ url a ] = [  0     0     1.00  ] [ url 1 ]
[ url b ]   [  1.00  1.00  0     ] [ url 2 ]
[ url c ]   [  0     0     1.00  ] [ url 3 ]
[ url d ]   [  0     0     1.00  ]
[ url e ]   [  1.00  0     0     ]
[ url f ]   [  1.00  0     0     ]
[ url g ]   [  1.00  1.00  0     ]
[ url h ]   [  0     1.00  0     ]
[ url i ]   [  0     1.00  0     ]
[ url j ]   [  1.00  0     0     ]
[ url k ]   [  1.00  0     0     ]
[ url l ]   [  1.00  1.00  1.00  ]
[ url m ]   [  0     0     1.00  ]
[ url n ]   [  0     0     1.00  ]
|matrix>

So I guess the implication is that just given a particular network structure (ie, a matrix), it is generally impossible to reconstruct the meaning of that network.
eg, I'm thinking mapping a neural network (say of a simple organism), and then just using the network structure to imply meaning. I think this would be hard.
BTW, talking about network structure meaning, I haven't worked out the meaning of these:

sa: create inverse
sa: merged-matrix[links-to,inverse-links-to]
[ url a ] = [  1.00  0     1.00  1.00  0     0     0     0     0     0     0     1.00  1.00  1.00  ] [ url a ]
[ url b ]   [  0     2.00  0     0     1.00  1.00  2.00  1.00  1.00  1.00  1.00  2.00  0     0     ] [ url b ]
[ url c ]   [  1.00  0     1.00  1.00  0     0     0     0     0     0     0     1.00  1.00  1.00  ] [ url c ]
[ url d ]   [  1.00  0     1.00  1.00  0     0     0     0     0     0     0     1.00  1.00  1.00  ] [ url d ]
[ url e ]   [  0     1.00  0     0     1.00  1.00  1.00  0     0     1.00  1.00  1.00  0     0     ] [ url e ]
[ url f ]   [  0     1.00  0     0     1.00  1.00  1.00  0     0     1.00  1.00  1.00  0     0     ] [ url f ]
[ url g ]   [  0     2.00  0     0     1.00  1.00  2.00  1.00  1.00  1.00  1.00  2.00  0     0     ] [ url g ]
[ url h ]   [  0     1.00  0     0     0     0     1.00  1.00  1.00  0     0     1.00  0     0     ] [ url h ]
[ url i ]   [  0     1.00  0     0     0     0     1.00  1.00  1.00  0     0     1.00  0     0     ] [ url i ]
[ url j ]   [  0     1.00  0     0     1.00  1.00  1.00  0     0     1.00  1.00  1.00  0     0     ] [ url j ]
[ url k ]   [  0     1.00  0     0     1.00  1.00  1.00  0     0     1.00  1.00  1.00  0     0     ] [ url k ]
[ url l ]   [  1.00  2.00  1.00  1.00  1.00  1.00  2.00  1.00  1.00  1.00  1.00  3.00  1.00  1.00  ] [ url l ]
[ url m ]   [  1.00  0     1.00  1.00  0     0     0     0     0     0     0     1.00  1.00  1.00  ] [ url m ]
[ url n ]   [  1.00  0     1.00  1.00  0     0     0     0     0     0     0     1.00  1.00  1.00  ] [ url n ]
|matrix>

sa: merged-matrix[friends,inverse-friends]
[ bella   ] = [  1.00  0     1.00  1.00  0     0     0     0     0     0     0     1.00  1.00  1.00  ] [ bella   ]
[ Beth    ]   [  0     2.00  0     0     1.00  1.00  2.00  1.00  1.00  1.00  1.00  2.00  0     0     ] [ Beth    ]
[ charlie ]   [  1.00  0     1.00  1.00  0     0     0     0     0     0     0     1.00  1.00  1.00  ] [ charlie ]
[ david   ]   [  1.00  0     1.00  1.00  0     0     0     0     0     0     0     1.00  1.00  1.00  ] [ david   ]
[ Ed      ]   [  0     1.00  0     0     1.00  1.00  1.00  0     0     1.00  1.00  1.00  0     0     ] [ Ed      ]
[ James   ]   [  0     1.00  0     0     1.00  1.00  1.00  0     0     1.00  1.00  1.00  0     0     ] [ James   ]
[ Jason   ]   [  0     2.00  0     0     1.00  1.00  2.00  1.00  1.00  1.00  1.00  2.00  0     0     ] [ Jason   ]
[ John    ]   [  0     1.00  0     0     0     0     1.00  1.00  1.00  0     0     1.00  0     0     ] [ John    ]
[ lena    ]   [  0     1.00  0     0     0     0     1.00  1.00  1.00  0     0     1.00  0     0     ] [ lena    ]
[ Liz     ]   [  0     1.00  0     0     1.00  1.00  1.00  0     0     1.00  1.00  1.00  0     0     ] [ Liz     ]
[ Mary    ]   [  0     1.00  0     0     1.00  1.00  1.00  0     0     1.00  1.00  1.00  0     0     ] [ Mary    ]
[ nathan  ]   [  1.00  2.00  1.00  1.00  1.00  1.00  2.00  1.00  1.00  1.00  1.00  3.00  1.00  1.00  ] [ nathan  ]
[ sam     ]   [  1.00  0     1.00  1.00  0     0     0     0     0     0     0     1.00  1.00  1.00  ] [ sam     ]
[ smithie ]   [  1.00  0     1.00  1.00  0     0     0     0     0     0     0     1.00  1.00  1.00  ] [ smithie ]
|matrix>

sa: merged-matrix[inverse-links-to,links-to]
[ url 1 ] = [  7.00  3.00  1.00  ] [ url 1 ]
[ url 2 ]   [  3.00  5.00  1.00  ] [ url 2 ]
[ url 3 ]   [  1.00  1.00  6.00  ] [ url 3 ]
|matrix>

sa: merged-matrix[inverse-friends,friends]
[ Alex  ] = [  7.00  3.00  1.00  ] [ Alex  ]
[ Bill  ]   [  3.00  5.00  1.00  ] [ Bill  ]
[ Harry ]   [  1.00  1.00  6.00  ] [ Harry ]
|matrix>

BTW, while we have this data loaded in the console, these do make sense:

sa: matrix[inverse-friends]
[ Alex  ] = [  0     1.00  0     0     1.00  1.00  1.00  0     0     1.00  1.00  1.00  0     0     ] [ bella   ]
[ Bill  ]   [  0     1.00  0     0     0     0     1.00  1.00  1.00  0     0     1.00  0     0     ] [ Beth    ]
[ Harry ]   [  1.00  0     1.00  1.00  0     0     0     0     0     0     0     1.00  1.00  1.00  ] [ charlie ]
                                                                                                     [ david   ]
                                                                                                     [ Ed      ]
                                                                                                     [ James   ]
                                                                                                     [ Jason   ]
                                                                                                     [ John    ]
                                                                                                     [ lena    ]
                                                                                                     [ Liz     ]
                                                                                                     [ Mary    ]
                                                                                                     [ nathan  ]
                                                                                                     [ sam     ]
                                                                                                     [ smithie ]
|matrix>

sa: matrix[inverse-links-to]
[ url 1 ] = [  0     1.00  0     0     1.00  1.00  1.00  0     0     1.00  1.00  1.00  0     0     ] [ url a ]
[ url 2 ]   [  0     1.00  0     0     0     0     1.00  1.00  1.00  0     0     1.00  0     0     ] [ url b ]
[ url 3 ]   [  1.00  0     1.00  1.00  0     0     0     0     0     0     0     1.00  1.00  1.00  ] [ url c ]
                                                                                                     [ url d ]
                                                                                                     [ url e ]
                                                                                                     [ url f ]
                                                                                                     [ url g ]
                                                                                                     [ url h ]
                                                                                                     [ url i ]
                                                                                                     [ url j ]
                                                                                                     [ url k ]
                                                                                                     [ url l ]
                                                                                                     [ url m ]
                                                                                                     [ url n ]
|matrix>

BTW, talking of "inverse-links-to", I wonder how hard it would be implement a basic "page rank" in BKO?
I've also been thinking a "Kevin Bacon" (six degrees of separation) game might be do-able in BKO too.

WOOT! I worked out the meaning!

sa: merged-matrix[inverse-friends,friends]
[ Alex  ] = [  7.00  3.00  1.00  ] [ Alex  ]
[ Bill  ]   [  3.00  5.00  1.00  ] [ Bill  ]
[ Harry ]   [  1.00  1.00  6.00  ] [ Harry ]
|matrix>

Call the matrix M
Then M(x,y) = count common[friends] (|x> + |y>)

Or, in general, set M = merged-matrix[inverse-op,op], then M(x,y) = count common[op] (|x> + |y>) = count intersection(op|x>,op|y>)
At least when the matrix[op] only contains elements in {0,1}. Not yet sure what happens if elements can take other values.
Update: Heh. Even simpler! You forgot this is BKO.
I will show in the console:

sa: inverse-friends friends |Alex>
7.000|Alex> + 3.000|Bill> + |Harry>

sa: inverse-friends friends |Bill>
3.000|Alex> + 5.000|Bill> + |Harry>

sa: inverse-friends friends |Harry>
6.000|Harry> + |Alex> + |Bill>

Heh. So it should be kind of obvious:

M(x,y) = <y|inverse-op op|x>               -- noting that M is symmetrical in x,y.

OK. Let's apply the map function to average (see way up above to see where we first played with average).

sa: load average.sw
sa: dump
----------------------------------------
|context> => |context: average>

ave |*> #=> arithmetic(count-sum "" |_self>,|/>,count "" |_self>)
apply-weights |*> #=> mult(""|_self>, weights|_self>)
weighted-ave |*> #=> arithmetic(count-sum apply-weights |_self>,|/>,count-sum weights |_self>)
tmp-ave |*> #=> arithmetic(count-sum 100 "" |_self>,|/>,count 100 "" |_self>)
harmonic-mean |*> #=> arithmetic(count "" |_self>,|/>,count-sum invert "" |_self>)

 |u> => |a> + 2.000|b> + 3.000|c> + 4.000|d>
 |x> => |a> + 2.000|b> + 3.000|c> + 4.000|d>
weights |x> => 0.100|a> + 0.100|b> + 0.700|c> + 0.100|d>

 |y> => |a> + 2.000|b> + 5.000|c> + 7.000|d>
weights |y> => 2.000|a> + 14.000|b> + 8.000|c> + 32.000|d>

 |tmp> => 0.100|a> + 0.100|b> + 0.700|c> + 0.100|d>
 |z> => 60.000|a> + 40.000|b>
----------------------------------------
sa: map[ave,average] (|u> + |x> + |y> + |z>)
sa: matrix[average]
[ number: 2.5  ] = [  1.00  1.00  0     0     ] [ u ]               -- NB: not one-to-one
[ number: 3.75 ]   [  0     0     1.00  0     ] [ x ]
[ number: 50.0 ]   [  0     0     0     1.00  ] [ y ]
                                                [ z ]
|matrix>

sa: map[harmonic-mean,harmonic] (|x> + |y> + |z>)
sa: matrix[harmonic]
[ number: 1.9200000000000004 ] = [  1.00  0     0     ] [ x ]       -- NB: it is one-to-one
[ number: 2.170542635658915  ]   [  0     1.00  0     ] [ y ]
[ number: 47.99999999999999  ]   [  0     0     1.00  ] [ z ]
|matrix>

sa: map[weighted-ave,weighted-average] (|x> + |y>)
sa: matrix[weighted-average]
[ number: 2.8  ] = [  1.00  0     ] [ x ]
[ number: 5.25 ]   [  0     1.00  ] [ y ]
|matrix>

A note on shared network structure.
I suspect, if two or more objects that are outwardly dissimilar, but they share a "network structure", then we can say:

They are metaphors for each other.
They are different representations of the same thing.
They are homomorphisms.

27/5/2014: I finally implemented the categorize code in python:

def metric_mbr(metric,x,thresh,data):
  for elt in data:
    if metric(x,elt) >= thresh:                      -- depending on if 0 or 1 means exact match, you may want to swap to: <= thresh 
      return True
  return False

def categorize_list(data,metric,thresh):
  out_list = []
  for x in data:
    n = 0
    del_list = []
    for i in range(len(out_list)):
      if metric_mbr(metric,x,thresh,out_list[i]):
        if n == 0:
          out_list[i].append(x)
          idx = i
          n = 1
        else:
          out_list[idx] += out_list[i]
          del_list.append(i)
    if n == 0:
      out_list.append([x])
    else:
      out_list = [x for index, x in enumerate(out_list) if index not in del_list]
  return out_list

Now I need to write a BKO version.

28/5/2014: Woot! I have a BKO version of categorize, and it seems to be working great!
Let's start with the code:

# 28/5/2014:
# working towards a BKO version of the categorize code.
# first, the equivalent of metric_mbr, using simm.
#
# one is a superposition
# op is a string
# x is a ket
# thresh is a float
def simm_mbr(context,op,x,thresh,one):
  f = x.apply_op(context,op)
  for elt in one.data:
    g = elt.apply_op(context,op)
    if silent_simm(f,g) >= thresh:
      return True
  return False
   
# categorize[op,thresh,destination]
def categorize(context,parameters):
  try:
    op,thresh,destination = parameters.split(',')
    thresh = float(thresh)
    destination = ket(destination)
  except:
    return ket("",0)
  
  one = context.relevant_kets(op)                 # one is a superposition
  print("one:",one)  
  out_list = []                                   # out_list will be a list of superpositions.
  for x in one.data:                              # x is of course a ket
    n = 0
    del_list = []                                 # del_list will be a list of integers.
    for i in range(len(out_list)):
      if simm_mbr(context,op,x,thresh,out_list[i]):
        if n == 0:
          out_list[i] += x
          idx = i
          n = 1
        else:
          out_list[idx] += out_list[i]
          del_list.append(i)
    if n == 0:
      out_list.append(superposition() + x)        # we use "superposition() + x" instead of just "x" so out_list is always a list of superpositions, not kets.
    else:
      out_list = [x for index,x in enumerate(out_list) if index not in del_list]

  for k, sp in enumerate(out_list):
    print("sp:",sp)
    context.learn("category-" + str(k),destination,sp)  
  return ket("categorize")

Cool, now let's put it to use in the console:

sa: load H-I-pat-rec.sw
sa: simm |*> #=> 100 similar[pixels] |_self> + 100 |_self>     -- need to add 100 |_self> else the diagonals in the simm matrix will be 0.
sa: |list> => |letter: H> + |noisy: H> + |noisy: H2> + |letter: I> + |noisy: I> + |noisy: I2> + |letter: O>
sa: |list> => shuffle "" |list>                                -- shuffle the list.
sa: map[simm,simm-pixels] "" |list>
sa: matrix[simm-pixels]
[ letter: H ] = [  100.00  29.41   40.91   82.35   76.19   17.65   35.00   ] [ letter: H ]
[ letter: I ]   [  29.41   100.00  45.45   26.67   38.10   73.33   65.00   ] [ letter: I ]
[ letter: O ]   [  40.91   45.45   100.00  36.36   50.00   36.36   40.91   ] [ letter: O ]
[ noisy: H  ]   [  82.35   26.67   36.36   100.00  61.90   14.29   25.00   ] [ noisy: H  ]
[ noisy: H2 ]   [  76.19   38.10   50.00   61.90   100.00  19.05   47.62   ] [ noisy: H2 ]
[ noisy: I  ]   [  17.65   73.33   36.36   14.29   19.05   100.00  45.00   ] [ noisy: I  ]
[ noisy: I2 ]   [  35.00   65.00   40.91   25.00   47.62   45.00   100.00  ] [ noisy: I2 ]
|matrix>

sa: categorize[pixels,0.6,result]
one: |letter: H> + |noisy: H> + |noisy: H2> + |letter: I> + |noisy: I> + |noisy: I2> + |letter: O>  -- the order is determined by relevant-kets[op]
sp: |letter: H> + |noisy: H> + |noisy: H2>
sp: |letter: I> + |noisy: I> + |noisy: I2>
sp: |letter: O>
|categorize>

sa: dump |result>
category-0 |result> => |letter: H> + |noisy: H> + |noisy: H2>
category-1 |result> => |letter: I> + |noisy: I> + |noisy: I2>
category-2 |result> => |letter: O>

BTW, turns out shuffling the list does nothing. The matrix spits out results using sort, and relevant-kets[op] order is the same as in the original .sw

Now, another example:

sa: load fragment-documents-64k--post-processing--saved.sw
sa: matrix[drop-6-simm]
[ diary-1-64k       ] = [  0  0      98.82  15.92  18.98  17.16  16.34  13.99  13.99  14.75  42.20  42.20  ] [ *                 ]
[ diary-2-64k       ]   [  0  98.82  0      16.18  19.24  17.65  16.27  14.31  14.31  15.07  42.17  42.17  ] [ diary-1-64k       ]
[ eztv-1-64k        ]   [  0  15.92  16.18  0      91.82  13.81  9.58   22.35  22.41  23.30  16.72  16.72  ] [ diary-2-64k       ]
[ eztv-2-64k        ]   [  0  18.98  19.24  91.82  0      13.81  9.58   22.20  22.26  23.16  17.80  17.80  ] [ eztv-1-64k        ]
[ semantic-1-64k    ]   [  0  17.16  17.65  13.81  13.81  0      79.93  15.79  15.79  15.88  10.20  10.20  ] [ eztv-2-64k        ]
[ semantic-2-64k    ]   [  0  16.34  16.27  9.58   9.58   79.93  0      11.56  11.56  11.65  10.00  10.00  ] [ semantic-1-64k    ]
[ slashdot-1-64k    ]   [  0  13.99  14.31  22.35  22.20  15.79  11.56  0      99.94  96.97  11.11  11.11  ] [ semantic-2-64k    ]
[ slashdot-2-64k    ]   [  0  13.99  14.31  22.41  22.26  15.79  11.56  99.94  0      97.03  11.11  11.11  ] [ slashdot-1-64k    ]
[ slashdot-3-64k    ]   [  0  14.75  15.07  23.30  23.16  15.88  11.65  96.97  97.03  0      11.09  11.09  ] [ slashdot-2-64k    ]
[ wc-comments-1-64k ]   [  0  42.20  42.17  16.72  17.80  10.20  10.00  11.11  11.11  11.09  0      99.89  ] [ slashdot-3-64k    ]
[ wc-comments-2-64k ]   [  0  42.20  42.17  16.72  17.80  10.20  10.00  11.11  11.11  11.09  99.89  0      ] [ wc-comments-1-64k ]
                                                                                                             [ wc-comments-2-64k ]
|matrix>
-- NB: the 0 on the diagonal because we used (ie, no + 100 |_self> term):
sa: dump |*>
drop-1-simm |*> #=> 100 similar[hash-64k] |_self>
drop-2-simm |*> #=> 100 similar[drop-2-hash] |_self>
drop-3-simm |*> #=> 100 similar[drop-3-hash] |_self>
drop-4-simm |*> #=> 100 similar[drop-4-hash] |_self>
drop-5-simm |*> #=> 100 similar[drop-5-hash] |_self>
drop-6-simm |*> #=> 100 similar[drop-6-hash] |_self>
drop-7-simm |*> #=> 100 similar[drop-7-hash] |_self>
drop-8-simm |*> #=> 100 similar[drop-8-hash] |_self>
drop-9-simm |*> #=> 100 similar[drop-9-hash] |_self>
drop-10-simm |*> #=> 100 similar[drop-10-hash] |_self>

sa: categorize[drop-6-hash,0.75,result]
one: |semantic-2-64k> + |eztv-1-64k> + |slashdot-3-64k> + |slashdot-1-64k> + |wc-comments-2-64k> + |diary-1-64k> + |eztv-2-64k> + |diary-2-64k> + |wc-comments-1-64k> + |slashdot-2-64k> + |semantic-1-64k>
sp: |semantic-2-64k> + |semantic-1-64k>
sp: |eztv-1-64k> + |eztv-2-64k>
sp: |slashdot-3-64k> + |slashdot-1-64k> + |slashdot-2-64k>
sp: |wc-comments-2-64k> + |wc-comments-1-64k>
sp: |diary-1-64k> + |diary-2-64k>
|categorize>

sa: dump |result>
category-0 |result> => |semantic-2-64k> + |semantic-1-64k>
category-1 |result> => |eztv-1-64k> + |eztv-2-64k>
category-2 |result> => |slashdot-3-64k> + |slashdot-1-64k> + |slashdot-2-64k>
category-3 |result> => |wc-comments-2-64k> + |wc-comments-1-64k>
category-4 |result> => |diary-1-64k> + |diary-2-64k>

BTW, the big-O for categorize is pretty terrible at the moment. So is intersection_fn() which I have been intending to fix for a while now (presumably with the help of ordered dictionaries).

Now, a brief note on an idea I call "bridging sets". Mathematically they are very simple, but they motivate the categorize code.

First, we say x is near y if metric[x,y] <= t for a metric of your choice, and some threshold t (of course, different values of t change the result!).
Then a linear bridging set is a set {x0,x1,x2,x3,...,xn} such that:
1) x_k is near x_k+1, for all k in {0,1,...,n}
2) x_0 is not near x_n

A general bridging set is a set {x0,x1,x2,x3,...,xn} such that:
1) for every j in {0,1,...,n}, x_j is near an x_k for some k != j in {0,1,...,n}. -- ie, every element in the set is near some other element in the set
2) there exist at least one j,k pair such that x_j is not near x_k -- in the categorize code, we tend to drop this requirement.

Hrmm... maybe (2) should be changed to:
2) there may exist j,k pairs such that x_j is not near x_k
to clearly distinguish from standard equivalency classes:
cf:
define an equivalency operator: a ~ b
1) a ~ a
2) if a ~ b then b ~ a -- ie, symmetrical
3) if a ~ b, and b ~ c, then a ~ c -- ie, transitive. NB: in general, bridging sets don't have this property!
4) if a ~ b then a is in [b] -- ie, a is a member of the equivalency class [b]

The point: given a set of elements, the categorize code partitions it into distinct general bridging sets.
Also, the lack of transitivity in bridging sets is why the categorize code has to go through some contortions!
(where by "lack of transitivity" I mean, just because a is near b, and b is near c, doesn't imply a is near c)

Some examples:
The easiest is a bridge. It is a very simple example of a linear bridging set, and along with species DNA was a motivator for the bridging set idea.
Set the left bank to be x_0, the right bank to be x_n, and the steps you take from one side to the other form the bridging set.

The 400m running track on an oval is a simple general bridging set, so is the path you take for your morning jog.

If we have a metric that can measure the similarity in DNA (some version of simm perhaps), then each species form distinct bridging sets.
And a good use case for the categorize code, BTW.

The collection of atoms that make up say a dog, form another bridging set.

The tree of life, ie the evolution of life from single cell to multi-cellular life is a big bridging set.
A smaller version of this is you are in a bridging set with your parents, grand-parents, and back to your ancestors.
And then via your parents, you are in a bridging set with your siblings, their children, their children's children and so on.

A train of thought, or math proof can also be considered bridging sets (though I'm not sure what putting it in these terms buys us).

A person's face from all different angles/perspectives form a bridging set. This idea should be useful!
Ditto a banana, or a pen or a stapler, or a tiger, or an elephant, any object really.

Your appearance, first as a baby, then up through adulthood, and then old age forms a linear bridging set.

Scenes in a movie/tv show, even as characters move around a bit, is a general bridging set.

Your weekly shopping basket is usually a linear bridging set (if you are a consistent shopper). Eg, from 5 years ago, week by week, till now.

There are other trivial examples of linear bridging sets:
{a,b,c,d,e,f,...,x,y,z}
{0,1,2,3,4,5,6,...,20,21,22,23,24,25}
Water slowly brought to the boil.
etc.

Some notes:
The value of t has a strong influence on the result.
Set it too tight and your categories splinter into smaller ones.
Set it too loose, and everything ends up in the same category.

The addition of a single new element can sometimes merge two or more categories into one, if it is in a key location.
And the other way too. The removal of a key element can fracture a category into two or more pieces (eg, if you remove the middle of the bridge, it is no longer a single bridging set).

OK. Wierdly, we can map bridging sets into equivalency classes, but I'm not sure what it buys us!
If a and b are members of the same bridging set, then we can say: a ~ b
Then standard equivalency class conditions (1,2,3,4 above) are met.

30/5/2014: OK. Decided to have a brief look back at the original WWW proposal by TBL.
In particular, this image:

With some work, we can translate this into BKO/sw format:

|context> => |context: www proposal>

describes |document: www proposal> => |"Hypertext"> + |A Proposal "Mesh">
refers-to |document: www proposal> => |Comms ACM>

describes |Comms ACM> => |"Hypertext"> 

includes |"Hypertext"> => |Linked information> + |Hypermedia>

for-example |Linked information> => |Hyper Card> + |ENQUIRE> + |A Proposal "Mesh">

describes |a proposal "mesh"> => |CERN>
unifies |a proposal "mesh"> => |ENQUIRE> + |VAX/NOTES> + |uucp News> + |CERNDOC>

examples |Computer conferencing> => |IBM GroupTalk> + |uucp News> + |VAX/NOTES> + |A Proposal "Mesh">

for-example |Hierarchical systems> => |CERN> + |CERNDOC> + |Vax/Notes> + |uucp News> + |IBM GroupTalk>

includes |CERNDOC> => |document: www proposal>

wrote |person: Tim Berners-Lee> => |document: www proposal>

30/6/2014 update: I recently wrote code to pretty print the data, instead of BKO format.
Anyway, let's use this data as an example:

sa: load www-proposal.sw
sa: display
  context: www proposal

  document: www proposal
  supported-ops: op: describes, op: refers-to
      describes: "Hypertext", A Proposal "Mesh"
      refers-to: Comms ACM

  Comms ACM
  supported-ops: op: describes
      describes: "Hypertext"

  "Hypertext"
  supported-ops: op: includes
       includes: Linked information, Hypermedia

  Linked information
  supported-ops: op: for-example
    for-example: Hyper Card, ENQUIRE, A Proposal "Mesh"

  a proposal "mesh"
  supported-ops: op: describes, op: unifies
      describes: CERN
        unifies: ENQUIRE, VAX/NOTES, uucp News, CERNDOC

  Computer conferencing
  supported-ops: op: examples
       examples: IBM GroupTalk, uucp News, VAX/NOTES, A Proposal "Mesh"

  Hierarchical systems
  supported-ops: op: for-example
    for-example: CERN, CERNDOC, Vax/Notes, uucp News, IBM GroupTalk

  CERNDOC
  supported-ops: op: includes
       includes: document: www proposal

  person: Tim Berners-Lee
  supported-ops: op: wrote
          wrote: document: www proposal

So this is just one more representation of knowledge (along with mind-maps, matrices and of course sw).
It does bring to light the question of what is unique about BKO that can't be done using standard databases.
I guess perhaps the answer is all the power the BKO language brings that we can apply to our knowledge once in sw format.

Decided to have a brief look at cyc.

Cyc:
(#$isa #$BillClinton #$UnitedStatesPresident)
"Bill Clinton belongs to the collection of U.S. presidents"

(#$genls #$Tree-ThePlant #$Plant)
"All trees are plants"

(#$capitalCity #$France #$Paris)
"Paris is the capital of France."

(#$implies
   (#$and   
     (#$isa ?OBJ ?SUBSET)
     (#$genls ?SUBSET ?SUPERSET))
   (#$isa ?OBJ ?SUPERSET))
"if OBJ is an instance of the collection SUBSET and SUBSET is a subcollection of SUPERSET, then OBJ is an instance of the collection SUPERSET"

(#$relationAllExists #$biologicalMother #$ChordataPhylum #$FemaleAnimal)
for every instance of the collection #$ChordataPhylum (i.e. for every chordate), there exists a female animal (instance of #$FemaleAnimal) which is its mother (described by the predicate #$biologicalMother).

Roughly translates to this BKO:

is-a |person: Bill Clinton> => |United States President>
or:
|United States President: _list> => ... + |person: Bill Clinton> + ...
<person: Bill Clinton|"" |United States President: _list> == 1

is-plant |tree: *> => |yes>

capital-city |country: France> => |city: Paris>

<OBJ|""|subset: _list> == 1
|superset: _list> => ... + |subset: _list> + ...
<OBJ|""|superset: _list> == 1
(or something like that, not 100% sure)

has-mother |Chordata Phylum: *> => |yes>

5/6/2014 update: Implemented the vector function. Really is the matrix function mentioned above, just you pass in the kets of interest (and I couldn't think of a better name!).
In the console:

sa: load matrix-as-network.sw
sa: vector[friends] |Alex>
[ Beth   ] = [  1.000  ] [ Alex ]
[ Ed     ]   [  1.000  ]
[ James  ]   [  1.000  ]
[ Jason  ]   [  1.000  ]
[ Liz    ]   [  1.000  ]
[ Mary   ]   [  1.000  ]
[ nathan ]   [  1.000  ]
|matrix>

sa: vector[friends] (|Bill> + |Harry>)
[ bella   ] = [  0      1.000  ] [ Bill  ]
[ Beth    ]   [  1.000  0      ] [ Harry ]
[ charlie ]   [  0      1.000  ]
[ david   ]   [  0      1.000  ]
[ Jason   ]   [  1.000  0      ]
[ John    ]   [  1.000  0      ]
[ lena    ]   [  1.000  0      ]
[ nathan  ]   [  1.000  1.000  ]
[ sam     ]   [  0      1.000  ]
[ smithie ]   [  0      1.000  ]
|matrix>

sa: relevant-kets[friends]
|Alex> + |Bill> + |Harry>

sa: vector[friends] relevant-kets[friends]      -- do it indirectly. 
[ bella   ] = [  0      0      1.000  ] [ Alex  ]
[ Beth    ]   [  1.000  1.000  0      ] [ Bill  ]
[ charlie ]   [  0      0      1.000  ] [ Harry ]
[ david   ]   [  0      0      1.000  ]
[ Ed      ]   [  1.000  0      0      ]
[ James   ]   [  1.000  0      0      ]
[ Jason   ]   [  1.000  1.000  0      ]
[ John    ]   [  0      1.000  0      ]
[ lena    ]   [  0      1.000  0      ]
[ Liz     ]   [  1.000  0      0      ]
[ Mary    ]   [  1.000  0      0      ]
[ nathan  ]   [  1.000  1.000  1.000  ]
[ sam     ]   [  0      0      1.000  ]
[ smithie ]   [  0      0      1.000  ]
|matrix>

-- and this one for fun!
sa: vector[friends] shuffle relevant-kets[friends]   -- NB: the shuffle in there changed the order (on the right hand side)
[ bella   ] = [  1.000  0      0      ] [ Harry ]
[ Beth    ]   [  0      1.000  1.000  ] [ Alex  ]
[ charlie ]   [  1.000  0      0      ] [ Bill  ]
[ david   ]   [  1.000  0      0      ]
[ Ed      ]   [  0      1.000  0      ]
[ James   ]   [  0      1.000  0      ]
[ Jason   ]   [  0      1.000  1.000  ]
[ John    ]   [  0      0      1.000  ]
[ lena    ]   [  0      0      1.000  ]
[ Liz     ]   [  0      1.000  0      ]
[ Mary    ]   [  0      1.000  0      ]
[ nathan  ]   [  1.000  1.000  1.000  ]
[ sam     ]   [  1.000  0      0      ]
[ smithie ]   [  1.000  0      0      ]
|matrix>

sa: vector[friends] shuffle relevant-kets[friends]   -- NB: different order. Harry, Bill, Alex instead of Harry, Alex, Bill.
[ bella   ] = [  1.000  0      0      ] [ Harry ]
[ Beth    ]   [  0      1.000  1.000  ] [ Bill  ]
[ charlie ]   [  1.000  0      0      ] [ Alex  ]
[ david   ]   [  1.000  0      0      ]
[ Ed      ]   [  0      0      1.000  ]
[ James   ]   [  0      0      1.000  ]
[ Jason   ]   [  0      1.000  1.000  ]
[ John    ]   [  0      1.000  0      ]
[ lena    ]   [  0      1.000  0      ]
[ Liz     ]   [  0      0      1.000  ]
[ Mary    ]   [  0      0      1.000  ]
[ nathan  ]   [  1.000  1.000  1.000  ]
[ sam     ]   [  1.000  0      0      ]
[ smithie ]   [  1.000  0      0      ]
|matrix>


sa: |list> => |Bill> + |Alex>

sa: vector[friends] "" |list>                   -- another indirect example.
[ Beth   ] = [  1.000  1.000  ] [ Bill ]
[ Ed     ]   [  0      1.000  ] [ Alex ]
[ James  ]   [  0      1.000  ]
[ Jason  ]   [  1.000  1.000  ]
[ John   ]   [  1.000  0      ]
[ lena   ]   [  1.000  0      ]
[ Liz    ]   [  0      1.000  ]
[ Mary   ]   [  0      1.000  ]
[ nathan ]   [  1.000  1.000  ]
|matrix>

-- if you pass in |>, it defaults back to show everyone that supports that op.
sa: vector[friends] |>
[ bella   ] = [  0      0      1.000  ] [ Alex  ]
[ Beth    ]   [  1.000  1.000  0      ] [ Bill  ]
[ charlie ]   [  0      0      1.000  ] [ Harry ]
[ david   ]   [  0      0      1.000  ]
[ Ed      ]   [  1.000  0      0      ]
[ James   ]   [  1.000  0      0      ]
[ Jason   ]   [  1.000  1.000  0      ]
[ John    ]   [  0      1.000  0      ]
[ lena    ]   [  0      1.000  0      ]
[ Liz     ]   [  1.000  0      0      ]
[ Mary    ]   [  1.000  0      0      ]
[ nathan  ]   [  1.000  1.000  1.000  ]
[ sam     ]   [  0      0      1.000  ]
[ smithie ]   [  0      0      1.000  ]
|matrix>

-- if you don't pass in a superposition, it defaults back to the default ket.
sa: id                               -- show the default ket
0.000|>                              -- the default ket is 0|>

sa: x = fred                         -- set the default ket
sa: id                               -- show the default ket
|fred>

sa: vector[friends]                  -- same as: vector[friends] |fred>
[  ] = [  0  ] [ fred ]              -- fred currently has no friends.
|matrix>

sa: friends |fred> => |Sam> + |Mary> -- give fred a couple of friends
sa: vector[friends]
[ Mary ] = [  1.000  ] [ fred ]
[ Sam  ]   [  1.000  ]
|matrix>

So that is about it. Same as merged-matrix, but you get to choose who you are interested in, instead of always giving data on everyone (that supports that op)!
Indeed, pretty much deprecates merged-matrix, but I'll leave it in for now.

11/6/2014: OK. A quick play with letter frequencies in ebooks. Code here.
Here is the resulting sw file.
And in matrix form:

sa: matrix[letter-count]
[ a ] = [  9083   26317  142241  23325  76232   35669  260565  35285  23871  ] [ Alice-in-Wonderland  ]
[ b ]   [  1621   4766   25476   4829   15699   6847   50138   6117   4763   ] [ Frankenstein         ]
[ c ]   [  2817   9055   37297   7379   21938   11349  72409   10725  6942   ] [ Gone-with-Wind       ]
[ d ]   [  5228   16720  85897   12139  37966   18763  144619  18828  15168  ] [ I-Robot              ]
[ e ]   [  15084  45720  228415  37293  117608  59029  440119  54536  37230  ] [ Moby-Dick            ]
[ f ]   [  2248   8516   34779   5940   20363   9936   73859   9105   6270   ] [ nineteen-eighty-four ]
[ g ]   [  2751   5762   38283   6037   20489   9113   61948   8023   6822   ] [ Shakespeare          ]
[ h ]   [  7581   19400  119901  16803  61947   28093  234301  28284  19130  ] [ Sherlock-Holmes      ]
[ i ]   [  7803   21411  101987  20074  62942   30304  214275  27361  18380  ] [ Tom-Sawyer           ]
[ j ]   [  222    431    1501    346    915     310    2955    421    465    ]
[ k ]   [  1202   1722   18290   2370   8011    3512   32029   3590   3136   ]
[ l ]   [  5053   12603  79783   12870  42338   18395  156371  17276  12426  ]
[ m ]   [  2245   10295  39595   6534   22871   10513  101507  11391  7255   ]
[ n ]   [  7871   24220  123989  21302  65429   31516  231652  29337  20858  ]
[ o ]   [  9245   25050  130230  24555  69648   34287  299732  34452  24251  ]
[ p ]   [  1796   5939   23979   5148   16553   8058   50638   6987   4766   ]
[ q ]   [  135    323    1270    321    1244    397    2998    416    182    ]
[ r ]   [  6400   20708  105074  17003  52446   25861  224994  25378  16262  ]
[ s ]   [  6980   20808  107430  18044  62734   28382  232317  27105  17852  ]
[ t ]   [  11631  29706  157163  28316  86983   42127  311911  39232  28389  ]
[ u ]   [  3867   10340  50453   9483   26933   12903  121631  13527  9376   ]
[ v ]   [  911    3788   15224   3062   8540    4252   36692   4471   2451   ]
[ w ]   [  2696   7335   43623   6761   21174   11225  78929   10754  7735   ]
[ x ]   [  170    675    1700    508    1037    779    4867    567    326    ]
[ y ]   [  2442   7743   37639   6552   16849   9071   90162   9267   6830   ]
[ z ]   [  79     243    1045    208    598     303    1418    150    155    ]

Here is the (scaled) similarity:

sa: 100 similar[letter-count] |Sherlock-Holmes>
97.879|nineteen-eighty-four> + 97.541|Tom-Sawyer> + 97.393|Moby-Dick> + 97.352|I-Robot> + 97.116|Gone-with-Wind> + 97.089|Alice-in-Wonderland> + 97.079|Shakespeare> + 96.516|Frankenstein>

So the relative frequency of letters is very similar across a broad range of text's including back to Shakespeare. Presumably other languages say French, German or Italian would have different frequencies.

Now a quick aside:

Standard simm is normalized/scaled so that (see definition of simm above) w*f == w*g.
def simm(A,B):
  return intersection(A.normalize(),B.normalize()).count_sum()
  
But there is also an non-normalized/scaled version:
def unscaled_simm(A,B):
  return intersection(A,B).count_sum()/max(A.count_sum(),B.count_sum())
  
Usually, from experience, it seems the scaled version gives better results. But for comparison, here is the unscaled simm (yet to wire into the processor, BTW):
sa: 100 unscaled-similar[letter-count] |Sherlock-Holmes>
95.354|nineteen-eighty-four> + 78.455|Frankenstein> + 69.638|Tom-Sawyer> + 68.690|I-Robot> + 46.045|Moby-Dick> + 27.084|Alice-in-Wonderland> + 24.687|Gone-with-Wind> + 12.244|Shakespeare>

Now some work in the console:

sa: load ebook-letter-counts.sw
sa: |list> => |nineteen-eighty-four> + |Tom-Sawyer> + |I-Robot> + |Gone-with-Wind> + |Frankenstein> + |Shakespeare> + |Moby-Dick> + |Sherlock-Holmes> + |Alice-in-Wonderland>
sa: norm |*> #=> normalize letter-count |_self>
sa: usimm |*> #=> 100 unscaled-similar[letter-count] |_self> + 100 |_self>  -- unscaled-similar not yet wired in.
sa: simm |*> #=> 100 similar[letter-count] |_self> + 100 |_self>
sa: map[norm,normalized-letter-count] "" |list>
sa: map[usimm,unscaled-simm-matrix] "" |list>
sa: map[simm,simm-matrix] "" |list>
sa: matrix[normalized-letter-count]
[ a ] = [  0.07753  0.07750  0.08118  0.07848  0.08114  0.07909  0.07375  0.08157  0.07923  ] [ Alice-in-Wonderland  ]
[ b ]   [  0.01384  0.01403  0.01454  0.01625  0.01671  0.01518  0.01419  0.01414  0.01581  ] [ Frankenstein         ]
[ c ]   [  0.02404  0.02666  0.02129  0.02483  0.02335  0.02516  0.02049  0.02479  0.02304  ] [ Gone-with-Wind       ]
[ d ]   [  0.04462  0.04923  0.04902  0.04084  0.04041  0.04160  0.04093  0.04352  0.05034  ] [ I-Robot              ]
[ e ]   [  0.12875  0.13463  0.13035  0.12548  0.12518  0.13089  0.12457  0.12607  0.12357  ] [ Moby-Dick            ]
[ f ]   [  0.01919  0.02508  0.01985  0.01999  0.02167  0.02203  0.02091  0.02105  0.02081  ] [ nineteen-eighty-four ]
[ g ]   [  0.02348  0.01697  0.02185  0.02031  0.02181  0.02021  0.01753  0.01855  0.02264  ] [ Shakespeare          ]
[ h ]   [  0.06471  0.05713  0.06843  0.05654  0.06594  0.06229  0.06632  0.06538  0.06349  ] [ Sherlock-Holmes      ]
[ i ]   [  0.06660  0.06305  0.05820  0.06754  0.06700  0.06719  0.06065  0.06325  0.06100  ] [ Tom-Sawyer           ]
[ j ]   [  0.00189  0.00127  0.00086  0.00116  0.00097  0.00069  0.00084  0.00097  0.00154  ]
[ k ]   [  0.01026  0.00507  0.01044  0.00797  0.00853  0.00779  0.00907  0.00830  0.01041  ]
[ l ]   [  0.04313  0.03711  0.04553  0.04330  0.04507  0.04079  0.04426  0.03994  0.04124  ]
[ m ]   [  0.01916  0.03032  0.02260  0.02199  0.02434  0.02331  0.02873  0.02633  0.02408  ]
[ n ]   [  0.06718  0.07132  0.07076  0.07168  0.06964  0.06988  0.06557  0.06782  0.06923  ]
[ o ]   [  0.07891  0.07376  0.07432  0.08262  0.07413  0.07603  0.08484  0.07964  0.08049  ]
[ p ]   [  0.01533  0.01749  0.01368  0.01732  0.01762  0.01787  0.01433  0.01615  0.01582  ]
[ q ]   [  0.00115  0.00095  0.00072  0.00108  0.00132  0.00088  0.00085  0.00096  0.00060  ]
[ r ]   [  0.05463  0.06098  0.05996  0.05721  0.05582  0.05734  0.06368  0.05867  0.05397  ]
[ s ]   [  0.05958  0.06127  0.06131  0.06071  0.06677  0.06293  0.06576  0.06266  0.05925  ]
[ t ]   [  0.09927  0.08747  0.08969  0.09528  0.09259  0.09341  0.08828  0.09069  0.09422  ]
[ u ]   [  0.03301  0.03045  0.02879  0.03191  0.02867  0.02861  0.03443  0.03127  0.03112  ]
[ v ]   [  0.00778  0.01115  0.00869  0.01030  0.00909  0.00943  0.01039  0.01034  0.00813  ]
[ w ]   [  0.02301  0.02160  0.02490  0.02275  0.02254  0.02489  0.02234  0.02486  0.02567  ]
[ x ]   [  0.00145  0.00199  0.00097  0.00171  0.00110  0.00173  0.00138  0.00131  0.00108  ]
[ y ]   [  0.02084  0.02280  0.02148  0.02205  0.01793  0.02011  0.02552  0.02142  0.02267  ]
[ z ]   [  0.00067  0.00072  0.00060  0.00070  0.00064  0.00067  0.00040  0.00035  0.00051  ]
|matrix>

sa: matrix[unscaled-simm-matrix]
[ Alice-in-Wonderland  ] = [  100.00000  34.50011   6.68626    39.42134   12.47074   25.97839   3.31616    27.08393   38.88633   ] [ Alice-in-Wonderland  ]
[ Frankenstein         ]   [  34.50011   100.00000  19.38041   87.14738   36.14696   75.27262   9.61202    78.45510   87.86411   ] [ Frankenstein         ]
[ Gone-with-Wind       ]   [  6.68626    19.38041   100.00000  16.96103   53.61561   25.73779   49.59655   24.68720   17.19438   ] [ Gone-with-Wind       ]
[ I-Robot              ]   [  39.42134   87.14738   16.96103   100.00000  31.63450   65.89134   8.41209    68.69032   96.69821   ] [ I-Robot              ]
[ Moby-Dick            ]   [  12.47074   36.14696   53.61561   31.63450   100.00000  48.00428   26.59149   46.04481   32.06974   ] [ Moby-Dick            ]
[ nineteen-eighty-four ]   [  25.97839   75.27262   25.73779   65.89134   48.00428   100.00000  12.76506   95.35360   66.77162   ] [ nineteen-eighty-four ]
[ Shakespeare          ]   [  3.31616    9.61202    49.59655   8.41209    26.59149   12.76506   100.00000  12.24400   8.52782    ] [ Shakespeare          ]
[ Sherlock-Holmes      ]   [  27.08393   78.45510   24.68720   68.69032   46.04481   95.35360   12.24400   100.00000  69.63764   ] [ Sherlock-Holmes      ]
[ Tom-Sawyer           ]   [  38.88633   87.86411   17.19438   96.69821   32.06974   66.77162   8.52782    69.63764   100.00000  ] [ Tom-Sawyer           ]
|matrix>

sa: matrix[simm-matrix]
[ Alice-in-Wonderland  ] = [  100.00000  94.93789   96.51590   97.31733   96.76409   97.11247   95.57426   97.08918   97.49465   ] [ Alice-in-Wonderland  ]
[ Frankenstein         ]   [  94.93789   100.00000  95.97417   96.01124   95.22426   96.47916   95.24344   96.51556   95.53771   ] [ Frankenstein         ]
[ Gone-with-Wind       ]   [  96.51590   95.97417   100.00000  96.00326   96.98087   97.01012   95.91322   97.11608   97.16821   ] [ Gone-with-Wind       ]
[ I-Robot              ]   [  97.31733   96.01124   96.00326   100.00000  97.30176   97.87105   96.06118   97.35198   97.11884   ] [ I-Robot              ]
[ Moby-Dick            ]   [  96.76409   95.22426   96.98087   97.30176   100.00000  98.04903   96.06974   97.39297   96.84666   ] [ Moby-Dick            ]
[ nineteen-eighty-four ]   [  97.11247   96.47916   97.01012   97.87105   98.04903   100.00000  95.54986   97.87913   97.10264   ] [ nineteen-eighty-four ]
[ Shakespeare          ]   [  95.57426   95.24344   95.91322   96.06118   96.06974   95.54986   100.00000  97.07934   95.89015   ] [ Shakespeare          ]
[ Sherlock-Holmes      ]   [  97.08918   96.51556   97.11608   97.35198   97.39297   97.87913   97.07934   100.00000  97.54125   ] [ Sherlock-Holmes      ]
[ Tom-Sawyer           ]   [  97.49465   95.53771   97.16821   97.11884   96.84666   97.10264   95.89015   97.54125   100.00000  ] [ Tom-Sawyer           ]
|matrix>

So it took 2 days of processing (yeah, my code sucks!) but I finally loaded up the Moby Part-of-Speech db into sw format.
Code here.
Note, I had to tidy it up first. I used this to map column separator to tab, and delete non-ascii chars:
tr '\r' '\n' < mobyposi.i | tr '\327' '\11' | tr -cd '\11\12\40-\176' > moby.txt
Also, my code crashed with the line: cowardic[tab]Ne, since e was not in my part-of-speech look-up table. ie, there is a typo in the data set.
Anyway, some minor usage of this data:

-- how many words in the data set?
sa: count "" |word: _list>
|number: 233088>

-- now some words:
sa: POS |word: the>
0.500|POS: Definite Article> + 0.500|POS: Adverb>

sa: POS |word: frog>
0.500|POS: Noun> + 0.500|POS: Verb (participle)>

sa: POS |word: swim>
0.250|POS: Verb (participle)> + 0.250|POS: Verb (transitive)> + 0.250|POS: Noun> + 0.250|POS: Verb (intransitive)>

sa: POS |word: fly>
0.200|POS: Verb (participle)> + 0.200|POS: Verb (intransitive)> + 0.200|POS: Verb (transitive)> + 0.200|POS: Noun> + 0.200|POS: Adjective>

sa: POS |word: Australia>
|POS: Noun>


-- do this to catch words/objects we have no POS data on:
sa: POS |*> => |don't know>

sa: POS |word: alkjdf>
|don't know>


-- add up all the parts-of-speech in a sentence:
sa: POS read |text: the frog jumped over the mat on his way to dinner>
1.500|POS: Definite Article> + 2.500|POS: Adverb> + 2.750|POS: Noun> + 0.750|POS: Verb (participle)> + |don't know> + |POS: Preposition> + 0.750|POS: Adjective> + 0.250|POS: Verb (transitive)> + 0.500|POS: Pronoun>

Now, a couple of comments:
1) it would work a bit better if we had sequences (ie order matters), rather than superpositions (which adds up elements).
2) it would be nice instead of giving probabilities of what part of speech it is, it was smart enough to use context to give the exact answer. That will be hard, I suspect.
2.5) then merge (1) with (2). ie, have sequences, and it gives the exact (rather than possible) answer.

11/6/2014: and now for something a little more interesting. The question of learning.
For a long while I thought learning was a matter of taking intersections.
eg, each time a child sees a dog, a parent says "dog".
Then the child mentally takes an intersection of everything that was in its mind each time they heard "dog".
So not a bad start to the idea of learning, but what if one day "dog" was mentioned while looking at a horse?
Intersection is brutal. If you give it one wrong learning example, you are left with an empty set!
So it eventually occurred to me that addition plus drop-below threshold is a little softer than intersection.
It can replicate strict intersection if you want, but you can also make it softer.
A short example in the console:

sa: load matrix-as-network.sw
sa: |list> => |Alex> + |Bill> + |Harry>
sa: dump "" |list>                                                      -- take a look at the data
friends |Alex> => |Jason> + |Ed> + |Mary> + |Liz> + |Beth> + |James> + |nathan>
friends |Bill> => |Jason> + |Beth> + |lena> + |John> + |nathan>
friends |Harry> => |charlie> + |bella> + |sam> + |smithie> + |david> + |nathan>

sa: intersection(friends |Alex>, friends |Bill>, friends |Harry>)       -- the strict version.
|nathan>

sa: friends "" |list>                                                   -- take a look at the data
2.000|Jason> + |Ed> + |Mary> + |Liz> + 2.000|Beth> + |James> + 3.000|nathan> + |lena> + |John> + |charlie> + |bella> + |sam> + |smithie> + |david>

sa: drop-below[3] friends "" |list>                                     -- replicate the (strict) intersection result
3.000|nathan>

sa: drop-below[2] friends "" |list>                                     -- softer "intersection" result
2.000|Jason> + 2.000|Beth> + 3.000|nathan>

-- and if the non-one coeffs bug you, simply:
sa: clean drop-below[3] friends "" |list>
|nathan>

sa: clean drop-below[2] friends "" |list>
|Jason> + |Beth> + |nathan>

So I guess the general idea is you have a largish training set (I have no exact number in mind at the moment).
But it is allowed to be "noisy". ie, there is signal, but non-zero noise in there too.
The cool thing is that if we add them up, the coeffs of the signal "reinforce", but presumably the noise is random from sample to sample.
This means as you add up the examples the signal kets grow in strength, but the noise kets all have small coeffs.
Indeed, if a noise ket does start to grow in strength, then maybe it too is actually signal.
Then at the end just apply a threshold filter, and you have pure signal.
So something like this (using some operator called foo):

foo |signal> => drop-below[t] (foo |example 1> + foo |example 2> + foo |example 3> + ... + foo |example n>)

OK. Here is a worked example:

sa: load H-I-pat-rec.sw
sa: |list> => |letter: H> + |noisy: H> + |noisy: H2>
sa: print-pixels[pixels] |letter: H>
I: 5
J: 7
1   1
1   1
1   1
11111
1   1
1   1
1   1
|pixels>

sa: print-pixels[pixels] |noisy: H>
I: 5
J: 7
    1
1   1
1   1
111 1
1
1   1
1   1
|pixels>

sa: print-pixels[pixels] |noisy: H2>
I: 5
J: 7
1   1
1
1 111
11111
11  1
1   1
111 1
|pixels>

sa: dim-1 |merged H> => |dimension: 5>              -- we need to define the dimensions, else print-pixels has no idea of the size.
sa: dim-2 |merged H> => |dimension: 7>
sa: pixels |merged H> => pixels "" |list>           -- define the merged H pattern
sa: print-pixels[pixels] |merged H>
I: 5
J: 7
2   3                                               -- this pattern can be considered signal + noise.
3   2
3 113
33323
31  2
3   3
311 3
|pixels>

sa: pixels |merged H> => common[pixels] "" |list>   -- the intersection version (note, common is an alias for intersection)
sa: print-pixels[pixels]
I: 5
J: 7
    1
1
1   1
111 1
1
1   1
1   1
|pixels>

sa: pixels |merged H> => drop-below[3] pixels "" |list>  -- the addition + strict threshold filter version
sa: print-pixels[pixels]
I: 5
J: 7
    3
3
3   3
333 3
3
3   3
3   3
|pixels>

sa: pixels |merged H> => drop-below[2] pixels "" |list>  -- the addition + softer threshold filter version
sa: print-pixels[pixels]
I: 5
J: 7
2   3                                                    -- heh. And our H pattern emerges out of the noise!
3   2
3   3
33323
3   2
3   3
3   3
|pixels>

sa: pixels |merged H> => clean drop-below[2] pixels "" |list>  -- clean the coeffs back to 1.
sa: print-pixels[pixels]
I: 5
J: 7
1   1                                                    -- and there is our nice clean H pattern!
1   1
1   1
11111
1   1
1   1
1   1
|pixels>

where I used a typing short-cut when I used "" |list>
eg:
  pixels |merged H> => pixels "" |list>
  pixels |merged H> => common[pixels] "" |list>
  pixels |merged H> => drop-below[3] pixels "" |list>
  pixels |merged H> => drop-below[2] pixels "" |list>
  pixels |merged H> => clean drop-below[2] pixels "" |list>
is identical to:
  pixels |merged H> => pixels |letter: H> + pixels |noisy: H> + pixels |noisy: H2>
  pixels |merged H> => intersection( pixels|letter: H> + pixels|noisy: H> + pixels|noisy: H2>)
  pixels |merged H> => drop-below[3] (pixels |letter: H> + pixels |noisy: H> + pixels |noisy: H2>) 
  pixels |merged H> => drop-below[2] (pixels |letter: H> + pixels |noisy: H> + pixels |noisy: H2>)
  pixels |merged H> => clean drop-below[2] (pixels |letter: H> + pixels |noisy: H> + pixels |noisy: H2>)

So I consider that a nice demonstration/proof-of-concept of my learning idea.
12/6/2014 update: I guess the hard part of machine learning is not this bit!
It is finding the mapping from object to well-behaved, deterministic, distinctive superposition.
Once you have that, then the rest is easy! Including pattern recognition using simm, and so on.
where:

well-behaved means similar objects return similar superpositions (this is the hard bit to achieve, but hopefully not impossible)
deterministic means if you feed in the same object, you get essentially the same superposition. There is some lee-way in that it doesn't have to be 100% identical on each run, but close.
distinctive means different object types have easily distinguishable superpositions (again, this is on the hard side)

31/7/2014 update: A little more on the correspondence between intersection and addition+drop-below. Consider this image:

Now lets enter some data for the sets A, B and C:

 |A> => |a1> + |a2> + |a3> + |ab> + |ac> + |abc>     
 |B> => |b1> + |b2> + |b3> + |ab> + |bc> + |abc>
 |C> => |c1> + |c2> + |c3> + |ac> + |bc> + |abc>

NB: any superposition with coeffs in {0,1} can be considered a standard maths set.
In this case:

A = {a1,a2,a3,ab,ac,abc}
B = {b1,b2,b3,ab,bc,abc}
C = {c1,c2,c3,ac,bc,abc}

I'm not sure if it goes the other way, can all sets be represented by superpositions?
For example, sets of sets. I don't know how to represent this using superpositions:

S = {{a1,a2,a3},{b1,b2},c1,{d1},{e1,e2,e3,e4},f1}

OK. How about this:

 |S> => |a> + |b> + |c1> + |d> + |e> + |f1>
 |a> => |a1> + |a2> + |a3>
 |b> => |b1> + |b2>
 |d> => |d1>
 |e> => |e1> + |e2> + |e3> + |e4>

Now, let's give some examples to show the correpondence between intersection and addition:

sa: intersection(""|A>,""|B>)
|ab> + |abc>

sa: ""|A> + ""|B>
|a1> + |a2> + |a3> + 2.000|ab> + |ac> + 2.000|abc> + |b1> + |b2> + |b3> + |bc>

sa: drop-below[2] (""|A> + ""|B>)
2.000|ab> + 2.000|abc>

sa: clean drop-below[2] (""|A> + ""|B>)            -- as promised, this is the same as an intersection.
|ab> + |abc>                                       -- though I think there are cases where it will give a different answer from intersection.
                                                   -- one example is when coeffs of A, B and C are not in {0,1}

Here are the other two:

sa: intersection(""|A>,""|C>)
|ac> + |abc>

sa: ""|A> + ""|C>
|a1> + |a2> + |a3> + |ab> + 2.000|ac> + 2.000|abc> + |c1> + |c2> + |c3> + |bc>

sa: clean drop-below[2] (""|A> + ""|C>)
|ac> + |abc>

sa: intersection(""|B>,""|C>)
|bc> + |abc>

sa: ""|B> + ""|C>
|b1> + |b2> + |b3> + |ab> + 2.000|bc> + 2.000|abc> + |c1> + |c2> + |c3> + |ac>

sa: clean drop-below[2] (""|B> + ""|C>)
|bc> + |abc>

Now, the final example, intersection between A, B and C.

sa: intersection(""|A>,""|B>,""|C>)
|abc>

sa: ""|A> + ""|B> + ""|C>                          -- it's cool to note the coeffs here match the numbers in the Venn diagram above.
|a1> + |a2> + |a3> + 2.000|ab> + 2.000|ac> + 3.000|abc> + |b1> + |b2> + |b3> + 2.000|bc> + |c1> + |c2> + |c3>

sa: clean drop-below[2] (""|A> + ""|B> + ""|C>)
|ab> + |ac> + |abc> + |bc>

sa: clean drop-below[3] (""|A> + ""|B> + ""|C>)
|abc>

And I guess an observation. If you add a bunch of sets using superposition notation the coeffs represent the number of sets that object is in.
While I'm here, the union version. Just add the superpositions and apply clean. eg:

sa: clean (""|A> + ""|B> + ""|C>)
|a1> + |a2> + |a3> + |ab> + |ac> + |abc> + |b1> + |b2> + |b3> + |bc> + |c1> + |c2> + |c3>

1/8/2014 update: In the above we used the default/empty label. We can use a defined label too.
eg:

S = {{a1,a2,a3},{b1,b2},c1,{d1},{e1,e2,e3,e4},f1}

set |S> => |a> + |b> + |c1> + |d> + |e> + |f1>
set |a> => |a1> + |a2> + |a3>
set |b> => |b1> + |b2>
set |c1> => |c1>
set |d> => |d1>
set |e> => |e1> + |e2> + |e3> + |e4>
set |f1> => |f1>

21/8/2014 update: And we can use exp-max on this data.

sa: load nested-set.sw
sa: exp-max[set] |S>
|S> + |a> + |b> + 3.000|c1> + |d> + |e> + 3.000|f1> + |a1> + |a2> + |a3> + |b1> + |b2> + |d1> + |e1> + |e2> + |e3> + |e4>

sa: exp-max[set] |a>
|a> + |a1> + |a2> + |a3>

6/8/2014 update: The other set related component I should mention about here is the set-builder/list-comprehension notation (though unfortunately, a long way from being implemented in the parser)
Anyway, meant to be something roughly like:

|x> in op |object> such that <y|op-sequence|x> >= 0.7

Then, if you want to map the resulting |x> to something, then (making use of the linearity of operators):

another-op (|x> in op |object> such that <y|op-sequence|x> >= 0.7)

Then use it in a learn-rule:

|some answer> => yet-another-op another-op (|x> in op |object> such that <y|op-sequence|x> >= 0.7)

Anyway, something along those lines.

21/8/2014 update: Just a couple of notes:

Maybe the brackets are not necessary. eg:
|some answer> => yet-another-op another-op |x> in op |object> such that <y|op-sequence|x> >= 0.7 

We can have more than one condition. eg:
|answer> => some-op |x> in op |object> such that <y|op-sequence|x> >= 0.7 or <z|op-sequence-2|x> == 0.3 and op-foo |x> == 2

17/6/2014: OK. I tried to load up a copy of the imdb database into a sw format, but apparantly this box doesn't have enough RAM.
With a little work, I filtered imdb (1.3GB) down to this (423M).
Even optimizing the code into this (ie, store actor data separately from movie data), the box still crashed out on me!
So I will have to wait till I get a bigger box ....
I do have this partial result though.
Anyway, the point was to do a version of the Kevin Bacon game.
Roughly like this:

kevin-bacon-0 |result> => actors movies |actor: Kevin Bacon>                             -- set of actors that share a movie with Kevin.
kevin-bacon-1 |result> => actors movies actors movies |actor: Kevin Bacon>               -- set of actors one step removed from Kevin.
kevin-bacon-2 |result> => actors movies actors movies actors movies |actor: Kevin Bacon> -- set of actors two steps removed.
kevin-bacon-3 |result> => actors movies actors movies actors movies actors movies |actor: Kevin Bacon> -- three steps removed

BTW, the coeffs of the actor kets contains some information too.
I am not sure about the rest, but the coeffs in kevin-bacon-0 is the number of movies shared with Kevin Bacon.
In Kevin's case, it is a count of the number of movies he has been in.

28/8/2014 update: Cool! It finally occurred to me that the coeffs are the number of pathways from Kevin Bacon to the given actor ket.

21/8/2014 update: Using Amazon EC2 (taking just over a week, and 10GB RAM) I finally converted the imdb text file into sw format (which BTW weighs in at 786 MB).
I guess next would be trying to run the Kevin Bacon game using it, but that would be expensive! Quite likely a week in EC2 compute time.

22/8/2014 update: Heh. OK. I wrote a version of context.recall(op,label) that reads from a file instead of needing the entire sw shoved in memory.
This should make running the Kevin Bacon game on it much, much more realistic.
Here is the code for that bit:

# similar to context.recall(op,label), but this works with files instead of all data in memory.
# Motivated by wanting to run the Kevin Bacon game on IMDB, but EC2 is too expensive to do all in mem.
# So hopefully I can do it using this approach (I already have a copy of IMDB in sw format - took about a week on EC2)
#
# filename is sw data/source file
# op is the operator label, a string
# label is the ket label, a string or a ket
#
# returns a superposition
def file_recall(filename,op,label):
  if type(label) == ket:
    coeff = label.value
    ket_label = label.label
  else:
    coeff = 1
    ket_label = label

  pattern = op + " |" + ket_label + "> => "
  n = len(pattern)
  print("pattern:",pattern)
  print("n:      ",n)

  with open(filename,'r') as f:
    for line in f:
      if line.startswith(pattern):
        print("line:",line)
        return extract_literal_superposition(line[n:])[0].multiply(coeff)
  return ket("",0)

Did some quick testing, with:

sw_file = "sw-examples/fred-sam-friends.sw"
r = file_recall(sw_file,"friends","Fred")
print("r:",r)

Then again, using "Sam" instead. And quashed a small bug, and it works great. Now I can upscale it to Kevin Bacon size, using this code:

# a single layer of the Kevin Bacon game.
# one is a superposition (though should handle kets too)
# returns a superposition.
#
# the code does: 
#   actors movies one-superposition
# eg: 
#   actors movies |actor: Kevin Bacon>
#
def Kevin_Bacon_game(bacon_file,one):
  if type(one) == str:                                           # make sure we have a superposition,
    one = superposition() + ket(one)                             # even if fed a string or a ket
  elif type(one) == ket:                                         # Hrmm... there has to be a neater way to write this mess!
    one = superposition() + one

#  one = one.apply_sigmoid(clean)                                 # optional to clean coeffs from our incomming sp

  sp1 = superposition()
  for x in one.data:
    sp1 += file_recall(bacon_file,"movies",x)

  sp2 = superposition()
  for x in sp1.data:
    sp2 += file_recall(bacon_file,"actors",x)

  print("len:",len(sp2))
  return sp2.coeff_sort()

OK. Then put it to use:

# this is the full game we are trying to replicate:
# kevin-bacon-0 |result> => actors movies |actor: Kevin Bacon>                             -- set of actors that share a movie with Kevin.
# kevin-bacon-1 |result> => actors movies actors movies |actor: Kevin Bacon>               -- set of actors one step removed from Kevin.
# kevin-bacon-2 |result> => actors movies actors movies actors movies |actor: Kevin Bacon> -- set of actors two steps removed.
# kevin-bacon-3 |result> => actors movies actors movies actors movies actors movies |actor: Kevin Bacon> -- three steps removed
# ...

sw_bacon_file = "sw-examples/just-movies-imdb.sw"         # our imdb data
r = ket("actor: Kevin (I) Bacon")                         # NB: we can choose any actor we like! We have the whole damn imdb to choose from!
N = 4                                                     # How deep we want to go. For now 4, but maybe 10 or bigger later!
for k in range(N):
  r = Kevin_Bacon_game(sw_bacon_file,r)
  C.learn("kevin-bacon-" + str(k),"result",r)

name = "sw-examples/kevin-bacon.sw"                       # save the results.
save_sw(C,name)

OK. I improved the code (hopefully faster, or at least less RAM) so now it writes results as it goes, instead of storing it all in RAM (not that I know anything about flushing and what-not).

# let's write a version that writes to disk as it goes.
sw_bacon_file = "sw-examples/just-movies-imdb.sw"    # our imdb data
sw_dest_file = "sw-examples/fast-write--kevin-bacon.sw" # where we are going to save the results
dest = open(sw_dest_file,'w')

# fake the context header:
dest.write("----------------------------------------\n")
dest.write("|context> => |context: Kevin Bacon game>\n\n")
# can't be bothered to fake the supported-ops line.


r = ket("actor: Kevin (I) Bacon")                    # NB: we can choose any actor we like! We have the whole damn imdb to choose from!
N = 10                                                # How deep we want to go. For now 4, but maybe 10 or bigger later!
for k in range(N):
  r = Kevin_Bacon_game(sw_bacon_file,r)
  dest.write("kevin-bacon-" + str(k) + " |result> => " + r.display(True) + "\n")  # r.display(True) for exact dump, not str(sp) version.
dest.write("----------------------------------------\n")
dest.close()

Anyway, here are the kevin-bacon-0 results.
Let's see how long our N = 10 takes on EC2. BTW, N = 1 takes about 1 min.

6/9/2014 update: OK. N = 2 took 12 days on EC2. Ouch! Need to improve that a lot.
Anyway, here are the results. OK. Had my first look over the kevin-bacon-1 |result>, and it is amazing to me that the highest coeff actors are also the most well known. Interesting result.
Another result is there are 58714 pathways between |actor: Kevin (I) Bacon> and [actors movies]^2 |actor: Kevin (I) Bacon>
ie, <actor: Kevin (I) Bacon|[actors movies]^2|actor: Kevin (I) Bacon> = 58714
Of course, this data-set has the bug of including some TV shows, like Award ceremonies. I plan to fix that for next run.
BTW, this is my definition of kevin-bacon-1:

kevin-bacon-1 |result> => actors movies actors movies |actor: Kevin Bacon>

I have since found out that others call this kevin-bacon-2, and kevin-bacon-0 == |actor: Kevin Bacon>.
Doesn't super matter, as long as the distinction is clear.

BTW, I already have an interesting observation from the kevin-bacon-0.sw file.

n = <actor: Y|actors movies|actor: Kevin Bacon>
and more generally:
m = <actor: Y|actors movies|actor: X>

n is the number of pathways between Kevin Bacon and actor Y (also of course, the number of movies they have shared).
m is the number of pathways between actor X and actor Y.

Not yet sure the meaning of the more general:
d1(X,Y,k) = <actor: Y|[actors movies]^k |actor: X>
or:
d2(X,Y,k) = <actor: Y|[actors movies clean]^k |actor: X>

Also NB, we see here a clear example showing that left association of operators is meaningless in this scheme (as we already noted way below).

(<actor: Y|actors) ...

clearly can't make sense, since the "actors" op is not defined when applied to an actor.
In contrast to:

(movies |actor: Y>)

which clearly does make sense.
And also implies we have a directed network (and not bi-directional).
ie, the network arrows point from the left (starting with Kevin), pointing to a list of movies, then a list of actors, and so on. (I'll try and make a diagram soon)

28/8/2014 update: The above motivates this general object:

d(X,Y,op,k) = <X|op^k|Y>
where:
X,Y can be pretty much anything 
op is just some operator, and can be compound, eg op = [actors movies] cf. just above
d() is symmetrical in X,Y only if op is "well defined" (which I need to define at some point :)

28/8/2014 update: Wrote some code to make use of IMDB in sw format.
(BTW, I dropped back from using superpositions to ordinary lists, since all the coeffs will be 1 anyway, and for speed reasons)
While I'm here, may as well link to the superposition versions: common_actors.py, common_movies.py
(and I think this is my first use of file_recall(filename,op,label). Written especially for the imdb data as it is too big to fit in memory, so we "grep" a file instead)
Some find_common movies/actors examples:

$ ./minimalist_find_common_ma.py "Sandra Bullock" "Keanu Reeves"

common movies for:
Sandra Bullock
Keanu Reeves
number of common movies: 6
common movies: movie: Inside 'Speed' (2002), movie: Cmo conseguir un papel en Hollywood (2007), movie: The Lake House (2006), movie: The Making of 'Speed' (1994), movie: Speed (1994), movie: Twentieth Century Fox: The Blockbuster Years (2000)

$ ./minimalist_find_common_ma.py "Star Trek Into Darkness (2013)" "Paul (2011)"

common actors for:
Star Trek Into Darkness (2013)
Paul (2011)
number of common actors: 2
common actors: actor: Simon Pegg, actor: Bill Hader

$ ./minimalist_find_common_ma.py "Star Trek Into Darkness (2013)" "12 Years a Slave (2013)"

common actors for:
Star Trek Into Darkness (2013)
12 Years a Slave (2013)
number of common actors: 1
common actors: actor: Benedict Cumberbatch

Note BTW, the code auto works out if you have given it 2 actors or 2 movies (though at the cost of a longer run-time).

28/8/2014: OK. May as well include the file_recall() here. Should be useful going forward.
The superposition version:
Doh! Already mentioned this above.
Anyway, here is the list version:

def file_recall(filename,op,label):

  pattern = op + " |" + label + "> => "
  n = len(pattern)

  with open(filename,'r') as f:
    for line in f:
      if line.startswith(pattern):
        line = line[n:]
#        return line[1:-1].split("> + |")       # NB how easy it is to parse a well defined literal superposition with all coeffs of 1. 
# put in a tweak to filter out "Awards" shows:  # Much cleaner than the extract_literal_superposition() code!
# but some TV shows remain. The fix is to filter out (TV) and (V) when parsing the original text version of IMDB.
        return [x for x in line[1:-1].split("> + |") if "Awards" not in x]
  return []

6/9/2014 update: OK. Wrote some code to spit out imdb-votes and ratings.
eg:

imdb-votes |movie: The Matrix (1999)> => |votes: 896239>
imdb-votes-self |movie: The Matrix (1999)> => 896239|movie: The Matrix (1999)>
imdb-rating |movie: The Matrix (1999)> => |rating: 8.7>
imdb-rating-self |movie: The Matrix (1999)> => 8.7|movie: The Matrix (1999)>

Anyway, the plan is to do things like:

imdb-rating movies |actor: Fred>

average-movie-rating |actor: Fred> => arithmetic(count-sum imdb-rating-self movies |_self>,|/>,count imdb-rating-self movies |_self>)

Once we have that working, generate the full set:

average-movie-rating |actor: *> #=> arithmetic(count-sum imdb-rating-self movies |_self>,|/>,count imdb-rating-self movies |_self>)
map[average-movie-rating] "" |actor: _list>

7/9/2014: OK. I have made some progress towards this. The problem is the imdb data is too big to just do in the console, so I have to do it in a python script.
Anyway, a couple of examples:

$ ./find_average_movie_rating.py "Kevin (I) Bacon"
actor: Kevin (I) Bacon
number of movies: 74
ratings:
8.1    movie: A Little Vicious (1991)
8.1    movie: Skum Rocks! (2013)
8.0    movie: JFK (1991)
8.0    movie: Mystic River (2003)
7.9    movie: Saving Angelo (2007)
7.8    movie: Sundance Skippy (2010)
7.8    movie: X-Men: First Class (2011)
7.7    movie: Frost/Nixon (2008)
7.6    movie: A Few Good Men (1992)
7.6    movie: Animal House (1978)
7.6    movie: Apollo 13 (1995)
7.6    movie: Freedom Downtime (2001)
7.6    movie: Planes, Trains & Automobiles (1987)
7.5    movie: Crazy, Stupid, Love. (2011)
7.5    movie: Sleepers (1996)
7.4    movie: Beyond All Boundaries (2009)
7.3    movie: Going to Pieces: The Rise and Fall of the Slasher Film (2006)
7.3    movie: Murder in the First (1995)
7.3    movie: The Woodsman (2004)
7.2    movie: Diner (1982)
7.2    movie: Tremors (1990)
7.1    movie: Vanilla Ice Archive (2012)
7.0    movie: Balto (1995)
7.0    movie: My Dog Skip (2000)
7.0    movie: Stir of Echoes (1999)
7.0    movie: The Air I Breathe (2007)
6.9    movie: A Look Behind the Scenes: Super (2011)
6.9    movie: Digging to China (1997)
6.9    movie: Natural Disasters: Forces of Nature (2004)
6.8    movie: Death Sentence (2007)
6.8    movie: Eastwood Directs: The Untold Story (2013)
6.8    movie: Lemon Sky (1988)
6.8    movie: Rails & Ties (2007)
6.8    movie: Super (2010/I)
6.8    movie: We Married Margo (2000)
6.6    movie: My One and Only (2009)
6.6    movie: Starting Over (1979)
6.6    movie: Where the Truth Lies (2005)
6.5    movie: Flatliners (1990)
6.5    movie: Friday the 13th (1980)
6.5    movie: Only When I Laugh (1981)
6.5    movie: Wild Things (1998)
6.4    movie: Boffo! Tinseltown's Bombs and Blockbusters (2006)
6.4    movie: Footloose (1984)
6.3    movie: Jayne Mansfield's Car (2012)
6.3    movie: Telling Lies in America (1997)
6.3    movie: The Big Picture (1989)
6.3    movie: The River Wild (1994)
6.2    movie: Hero at Large (1980)
6.2    movie: Trapped (2002)
6.1    movie: New York Skyride (1994)
6.1    movie: White Water Summer (1987)
5.9    movie: Queens Logic (1991)
5.8    movie: Criminal Law (1988)
5.8    movie: Novocaine (2001)
5.8    movie: She's Having a Baby (1988)
5.7    movie: End of the Line (1987)
5.7    movie: Forty Deuce (1982)
5.7    movie: Hollow Man (2000)
5.6    movie: Cavedweller (2004)
5.6    movie: R.I.P.D. (2013)
5.5    movie: He Said, She Said (1991)
5.5    movie: Loverboy (2005)
5.5    movie: Quicksilver (1986)
5.4    movie: Beauty Shop (2005)
5.4    movie: Enormous Changes at the Last Minute (1983)
5.4    movie: Imagine New York (2003)
5.4    movie: Picture Perfect (1997)
5.4    movie: The Air Up There (1994)
5.3    movie: In the Cut (2003)
5.3    movie: These Vagabond Shoes (2009)
5.2    movie: Film Trix 2004 (2004)
5.1    movie: Elephant White (2011)
4.7    movie: Pyrates (1991)
average movie rating: 6.56

$ ./find_average_movie_rating.py "Brad Pitt"
actor: Brad Pitt
number of movies: 61
ratings:
8.9    movie: Fight Club (1999)
8.7    movie: Se7en (1995)
8.3    movie: Inglourious Basterds (2009)
8.3    movie: Snatch. (2000)
8.2    movie: 12 Years a Slave (2013)
8.1    movie: Exit Through the Gift Shop (2010)
8.1    movie: Twelve Monkeys (1995)
8.0    movie: True Romance (1993)
7.8    movie: Being John Malkovich (1999)
7.8    movie: Ocean's Eleven (2001)
7.8    movie: The Curious Case of Benjamin Button (2008)
7.6    movie: Interview with the Vampire: The Vampire Chronicles (1994)
7.6    movie: Moneyball (2011)
7.6    movie: The Assassination of Jesse James by the Coward Robert Ford (2007)
7.5    movie: Babel (2006)
7.5    movie: Legends of the Fall (1994)
7.5    movie: Sleepers (1996)
7.5    movie: The Big Uneasy (2010)
7.4    movie: Beyond All Boundaries (2009)
7.4    movie: Special Thanks to Roy London (2005)
7.4    movie: Thelma & Louise (1991)
7.4    movie: Touch of Evil (2011)
7.3    movie: A River Runs Through It (1992)
7.3    movie: Megamind (2010)
7.2    movie: Troy (2004)
7.1    movie: Confessions of a Dangerous Mind (2002)
7.1    movie: Meet Joe Black (1998)
7.1    movie: World War Z (2013)
7.0    movie: Burn After Reading (2008)
7.0    movie: Seven Years in Tibet (1997)
7.0    movie: Smash His Camera (2010)
7.0    movie: Spy Game (2001)
6.9    movie: Ocean's Thirteen (2007)
6.7    movie: Don't Tell My Booker!!! (2007)
6.7    movie: Kalifornia (1993)
6.7    movie: The Tree of Life (2011)
6.6    movie: Bad Boy Kummer (2010)
6.6    movie: Sinbad: Legend of the Seven Seas (2003)
6.5    movie: Mr. & Mrs. Smith (2005)
6.4    movie: Boffo! Tinseltown's Bombs and Blockbusters (2006)
6.4    movie: Ocean's Twelve (2004)
6.3    movie: Contact (1992)
6.3    movie: Less Than Zero (1987)
6.2    movie: Killing Them Softly (2012)
6.1    movie: Los Angeles (2005)
6.1    movie: The Mexican (2001)
6.0    movie: Happy Together (1989)
6.0    movie: No Man's Land (1987)
6.0    movie: The Devil's Own (1997)
5.9    movie: Happy Feet Two (2011)
5.8    movie: Johnny Suede (1991)
5.7    movie: Across the Tracks (1990)
5.4    movie: The Counselor (2013)
5.4    movie: The Dark Side of the Sun (1988)
5.2    movie: The Favor (1994)
4.8    movie: Full Frontal (2002)
4.7    movie: Brad Pitt Video Portrait (2006)
4.7    movie: Cool World (1992)
4.6    movie: Abby Singer (2003)
4.4    movie: Hunk (1987)
4.1    movie: Cutting Class (1989)
average movie rating: 6.77

$ ./find_average_movie_rating.py "Angelina Jolie"
actor: Angelina Jolie
number of movies: 47
ratings:
8.1    movie: Exit Through the Gift Shop (2010)
7.8    movie: Changeling (2008)
7.6    movie: Kung Fu Panda (2008)
7.3    movie: Girl, Interrupted (1999)
7.3    movie: Kung Fu Panda 2 (2011)
7.3    movie: Maleficent (2014)
7.3    movie: The Day After Peace (2008)
7.2    movie: Playing by Heart (1998)
7.2    movie: The International Criminal Court (2013)
7.1    movie: Jane's Journey (2010)
7.0    movie: Smash His Camera (2010)
6.7    movie: A Mighty Heart (2007)
6.7    movie: Don't Tell My Booker!!! (2007)
6.7    movie: The Good Shepherd (2006)
6.7    movie: Wanted (2008)
6.6    movie: The Bone Collector (1999)
6.5    movie: Mr. & Mrs. Smith (2005)
6.4    movie: Beyond Borders (2003)
6.4    movie: Gone in Sixty Seconds (2000)
6.4    movie: Salt (2010)
6.4    movie: Top Priority: The Terror Within (2012)
6.3    movie: Beowulf (2007)
6.2    movie: Hackers (1995)
6.2    movie: Valencia: The Movie/S (2013)
6.1    movie: A Place in Time (2007)
6.1    movie: Foxfire (1996)
6.1    movie: Sky Captain and the World of Tomorrow (2004)
6.1    movie: Taking Lives (2004)
6.1    movie: The Fever (2004)
6.0    movie: Original Sin (2001)
6.0    movie: Pushing Tin (1999)
6.0    movie: Shark Tale (2004)
6.0    movie: The Tourist (2010)
5.8    movie: Life or Something Like It (2002)
5.7    movie: Lara Croft: Tomb Raider (2001)
5.6    movie: Playing God (1997)
5.5    movie: Alexander (2004)
5.4    movie: Lara Croft Tomb Raider: The Cradle of Life (2003)
5.2    movie: Lookin' to Get Out (1982)
5.2    movie: Love Is All There Is (1996)
5.2    movie: Mojave Moon (1996)
5.0    movie: Alice & Viril (1993)
5.0    movie: Trading Women (2003)
4.9    movie: Angela & Viril (1993)
4.9    movie: Hell's Kitchen (1998)
4.3    movie: Without Evidence (1995)
2.8    movie: Sledge: The Untold Story (2005)
average movie rating: 6.18

$ ./find_average_movie_rating.py "Tom Cruise"
actor: Tom Cruise
number of movies: 50
ratings:
8.1    movie: A Tribute to J.J. Abrams (2013)
8.1    movie: Edge of Tomorrow (2014)
8.0    movie: Magnolia (1999)
8.0    movie: Rain Man (1988)
8.0    movie: Stanley Kubrick: A Life in Pictures (2001)
7.8    movie: Religulous (2008)
7.7    movie: Minority Report (2002)
7.7    movie: The Last Samurai (2003)
7.6    movie: A Few Good Men (1992)
7.6    movie: Collateral (2004)
7.6    movie: Interview with the Vampire: The Vampire Chronicles (1994)
7.5    movie: Mission: Impossible Ghost Protocol Special Feature - Soaring in Dubai (2011)
7.5    movie: Space Station 3D (2002)
7.4    movie: Mission: Impossible - Ghost Protocol (2011)
7.4    movie: The Queen (2006)
7.3    movie: Der Geist des Geldes (2007)
7.3    movie: Eyes Wide Shut (1999)
7.3    movie: Jerry Maguire (1996)
7.2    movie: Born on the Fourth of July (1989)
7.2    movie: The Outsiders (1983)
7.1    movie: Valkyrie (2008)
7.0    movie: Jack Reacher (2012)
7.0    movie: Mission: Impossible (1996)
7.0    movie: Oblivion (2013/I)
7.0    movie: The Color of Money (1986)
7.0    movie: Tropic Thunder (2008)
6.9    movie: Vanilla Sky (2001)
6.8    movie: Mission: Impossible III (2006)
6.8    movie: Risky Business (1983)
6.8    movie: Sex, Drugs & Religion (2010)
6.8    movie: The Firm (1993)
6.8    movie: Top Gun (1986)
6.7    movie: Don't Tell My Booker!!! (2007)
6.7    movie: Taps (1981)
6.5    movie: Far and Away (1992)
6.5    movie: War of the Worlds (2005)
6.4    movie: Boffo! Tinseltown's Bombs and Blockbusters (2006)
6.4    movie: Legend (1985)
6.3    movie: Knight and Day (2010)
6.2    movie: Austin Powers in Goldmember (2002)
6.2    movie: Lions for Lambs (2007)
6.0    movie: Mission: Impossible II (2000)
5.9    movie: All the Right Moves (1983)
5.9    movie: Rock of Ages (2012)
5.8    movie: Days of Thunder (1990)
5.7    movie: Cocktail (1988)
5.4    movie: August (2008)
4.8    movie: Losin' It (1983)
4.6    movie: Endless Love (1981)
4.1    movie: Junket Whore (1998)
average movie rating: 6.83

$ ./find_average_movie_rating.py "Alyssa Milano"
actor: Alyssa Milano
number of movies: 27
ratings:
7.3    movie: Life After Tomorrow (2006)
7.1    movie: The Blue Hour (2007)
6.9    movie: Where the Day Takes You (1991)
6.7    movie: 10 Minutes (2010)
6.6    movie: Commando (1985)
6.5    movie: Jimmy Zip (1996)
6.5    movie: Old Enough (1984)
6.2    movie: Fear (1996)
6.0    movie: Pathology (2008)
5.9    movie: Buying the Cow (2002)
5.9    movie: Hall Pass (2011)
5.8    movie: Little Sister (1992)
5.8    movie: My Girlfriend's Boyfriend (2010)
5.6    movie: Dickie Roberts: Former Child Star (2003)
5.6    movie: Glory Daze (1995)
5.6    movie: Kiss the Bride (2002)
5.6    movie: New Year's Eve (2011)
5.6    movie: Rockin' the Corps: An American Thank You (2005)
5.2    movie: Dinotopia: Quest for the Ruby Sunstone (2005)
5.1    movie: Hugo Pool (1997)
5.0    movie: Below Utopia (1997)
4.8    movie: Deadly Sins (1995)
4.3    movie: Embrace of the Vampire (1995)
4.3    movie: Poison Ivy II (1996)
4.2    movie: Speed Zone (1989)
3.9    movie: Conflict of Interest (1993)
3.5    movie: Double Dragon (1994)
average movie rating: 5.61

I finally implemented a superposition version of find-topic. Details later.

18/6/2014: Here is a simple movie recommendation algo:

favourite-movies |Fred> => 6 |movie: a> + 9 |movie: b> + 8 |movie: c> + 10 |movie: d> + 6.5 |movie: e>
-- where the coeffs represent how strongly Fred liked those movies.
-- Indeed, this is a case where negative coeffs would be useful. 
-- eg, in my case, -20 for musicals!
-- -20 |feature: musical>

-- a db of features for movies, with the respective strengths as coeffs.
features |movie: a> => 5 |feature: 9> + 14 |feature: 3> + 9 |feature: 1> + 2 |feature: 20>
features |movie: b> => 13 |feature: 24> + |feature: 27> + 6 |feature: 42> + 14 |feature: 23> + 6 |feature: 22>
features |movie: c> => 3 |feature: 4> + 7 |feature: 44> + 2 |feature: 28>
features |movie: d> => 11 |feature: 10> + 4 |feature: 43> + 4 |feature: 4> + 13 |feature: 23> + 12 |feature: 21> + 2 |feature: 1>
features |movie: e> => 1 |feature: 47> + 6 |feature: 11> + 13 |feature: 2> + 11 |feature: 17> + 7 |feature: 2>

features |Fred's favourite movie features> => features favourite-movies |Fred>
|Fred's suggested movies: 1> => 100 similar[features] |Fred's favourite movie features>

OK. If we load that up into the console we get:

|Fred's suggested movies: 1> => 48.279|movie: d> + 36.485|movie: b> + 18.392|movie: e> + 14.892|movie: a> + 10.127|movie: c>

OK. It sort of works, but a rated 8/10 movie ended up with the lowest score, so we need to tweak it.
So, let's try again, this time with normalized features.
(without this |movie: c> is unfairly "punished" since its coeffs are small compared to other movies.)

favourite-movies |Fred> => 6 |movie: a> + 9 |movie: b> + 8 |movie: c> + 10 |movie: d> + 6.5 |movie: e>

normed-features |movie: a> => normalize( 5 |feature: 9> + 14 |feature: 3> + 9 |feature: 1> + 2 |feature: 20>)
normed-features |movie: b> => normalize( 13 |feature: 24> + |feature: 27> + 6 |feature: 42> + 14 |feature: 23> + 6 |feature: 22>)
normed-features |movie: c> => normalize( 3 |feature: 4> + 7 |feature: 44> + 2 |feature: 28>)
normed-features |movie: d> => normalize( 11 |feature: 10> + 4 |feature: 43> + 4 |feature: 4> + 13 |feature: 23> + 12 |feature: 21> + 2 |feature: 1>)
normed-features |movie: e> => normalize( 1 |feature: 47> + 6 |feature: 11> + 13 |feature: 2> + 11 |feature: 17> + 7 |feature: 2> )

normed-features |normed Fred's favourite movie features> => normed-features favourite-movies |Fred>

|Fred's suggested movies: 2> => 100 similar[normed-features] |normed Fred's favourite movie features>

Then load it up in the console, and we get:

|Fred's suggested movies: 2> => 41.602|movie: d> + 29.939|movie: b> + 22.455|movie: c> + 16.456|movie: e> + 16.291|movie: a>

Which looks like we are on the right track at least, since it has the same ordering as his favourite movies:

-- sorted by coeff:
favourite-movies |Fred> => 10 |movie: d> + 9 |movie: b> + 8 |movie: c> + 6.5 |movie: e> + 6 |movie: a>

So the hard bit now is to build up a database mapping movies to a list of features, and their respective strengths.
Then for each user, we would need a list of favourite movies along with ratings.
BTW, here is simple-movie-recommendation-example.sw

19/6/2014: Now, I will try to explain the simple movie recommendation algo.
I'm not claiming this is the best algo out there to do that. Probably a long way from that.
But it is pretty cool how well it works with so little effort.

First, you need lots of data.
For each movie you need data on the features of that movie (comedy, romance, sci-fi, action, etc, and more specific stuff like, has Angelina Jolie in it, and so on.).
And a strength for how strongly it has that property.
So eg:

normed-features |movie: a> => normalize( 5 |feature: 9> + 14 |feature: 3> + 9 |feature: 1> + 2 |feature: 20>)
normed-features |movie: b> => normalize( 13 |feature: 24> + |feature: 27> + 6 |feature: 42> + 14 |feature: 23> + 6 |feature: 22>)
normed-features |movie: c> => normalize( 3 |feature: 4> + 7 |feature: 44> + 2 |feature: 28>)
normed-features |movie: d> => normalize( 11 |feature: 10> + 4 |feature: 43> + 4 |feature: 4> + 13 |feature: 23> + 12 |feature: 21> + 2 |feature: 1>)
normed-features |movie: e> => normalize( 1 |feature: 47> + 6 |feature: 11> + 13 |feature: 2> + 11 |feature: 17> + 7 |feature: 2> )
normed-features |movie: f> => normalize( ...)
normed-features |movie: g> => normalize(...)
etc.

Then for each user, we get them to rate their favourite movies.
eg:

favourite-movies |Fred> => 6 |movie: a> + 9 |movie: b> + 8 |movie: c> + 10 |movie: d> + 6.5 |movie: e>
favourite-movies |Sam> => 9 |movie: a> + 5 |movie: b> + 6 |movie: c> + 10 |movie: g>
etc.

Then for each user we do a little processing:

normed-features |normed Fred's favourite movie features> => normed-features favourite-movies |Fred>
normed-features |normed Sam's favourite movie features> => normed-features favourite-movies |Sam>
etc.

Then we look up the data and try to find the best match:

|Fred's suggested movies> => 100 similar[normed-features] |normed Fred's favourite movie features>
|Sam's suggested movies> => 100 similar[normed-features] |normed Sam's favourite movie features>

A quick example of mapping evidence in a crime to suspects.
Roughly:

evidence-1 |crime: a> => normalize( 9 |suspect: 1> + 2 |suspect: 2> + 15 |suspect: 3>)
evidence-2 |crime: a> => normalize( 0 |suspect: 1> + 5 |suspect: 2> + 4 |suspect: 3> + 12 |suspect: 4>)
evidence-3 |crime: a> => normalize( 7 |suspect: 1> + 13 |suspect: 2> + 25 |suspect: 4>)
evidence-4 |crime: a> => normalize( 2 |suspect: 1> + 4 |suspect: 3> + 6 |suspect: 6>)

|suspect> => evidence-1 |crime: a> +  evidence-2 |crime: a> + evidence-3 |crime: a> + evidence-4 |crime: a>

Load this up into the console, and we have:

sa: load evidence-crime-vs-suspect.sw
sa: dump
----------------------------------------
|context> => |context: evidence of crime vs suspect>

evidence-1 |crime: a> => 0.346|suspect: 1> + 0.077|suspect: 2> + 0.577|suspect: 3>
evidence-2 |crime: a> => 0.000|suspect: 1> + 0.238|suspect: 2> + 0.190|suspect: 3> + 0.571|suspect: 4>
evidence-3 |crime: a> => 0.156|suspect: 1> + 0.289|suspect: 2> + 0.556|suspect: 4>
evidence-4 |crime: a> => 0.167|suspect: 1> + 0.333|suspect: 3> + 0.500|suspect: 6>

 |suspect> => 0.668|suspect: 1> + 0.604|suspect: 2> + 1.101|suspect: 3> + 1.127|suspect: 4> + 0.500|suspect: 6>
----------------------------------------
sa: coeff-sort "" |suspect>
1.127|suspect: 4> + 1.101|suspect: 3> + 0.668|suspect: 1> + 0.604|suspect: 2> + 0.500|suspect: 6>

20/6/2014: OK. Time to type up a little bit of recent work.
I mapped wikipedia pages to frequency lists, and then ran find-topic on them.
Oh, and I finally implemented a superposition version of find-topic (copied from the "map to topic" code). Previously it only applied to kets.
Recall BTW, that find-topic uses the normed-frequency-class equation applied to frequency lists.
Here is a previous example applied to male, female, and last names.
Anyway, some examples:

sa: load WP-word-frequencies.sw
sa: find-topic[words] |wikipedia>
18.539|WP: US presidents> + 18.539|WP: particle physics> + 16.479|WP: rivers> + 16.479|WP: physics> + 16.479|WP: country list> + 13.483|WP: Australia>

sa: find-topic[words] |adelaide>
74.576|WP: Adelaide> + 25.424|WP: Australia>

sa: find-topic[words] |sydney>
60.241|WP: Australia> + 39.759|WP: Adelaide>

sa: find-topic[words] |canberra>
100.000|WP: Australia>

sa: find-topic[words-2] |aami stadium>
100.000|WP: Adelaide>

sa: find-topic[words-2] |river torrens>
100.000|WP: Adelaide>


sa: find-topic[words] (|river> + |nile>)                          -- an example of a superposition version of find-topic.
76.811|WP: rivers> + 13.788|WP: Adelaide> + 9.401|WP: Australia>

sa: find-topic[words-2] |river nile>                              -- NB: ket version gives a better answer in this case.
100.000|WP: rivers>                                               -- If you know the exact phrase, use the ket version.
                                                                  -- If you don't, then use the superposition version
                                                                  -- which adds up the results from the pieces.

sa: find-topic[words-2] |adelaide university>                     -- here is an example of what I was just talking about.
|>                                                                -- there is no exact match for "adelaide university"

sa: find-topic[words] (|adelaide> + |university>)
66.236|WP: Adelaide> + 33.764|WP: Australia>                      -- at least this time, using sp version, we got something of a result. 

sa: find-topic[words-3] |university of adelaide>                  -- ahh... we found an exact match this time.
76.923|WP: Adelaide> + 23.077|WP: Australia>                      -- also means the WP page for Australia contains the phrase:
                                                                  -- "university of adelaide"

OK. Let's check that:
$ grep -i "university" text/WP-Australia.txt
Australia has 37 government-funded universities and two private universities, as well as a number of other specialist institutions that provide approved courses at the higher education level.[282] The University of Sydney is Australia's oldest university, having been founded in 1850, followed by the University of Melbourne three years later. Other notable universities include those of the Group of Eight leading tertiary institutions, including the University of Adelaide(which boasts an association with five Nobel Laureates), the Australian National University located in the national capital of Canberra, Monash University and the University of New South Wales.

sa: find-topic[words] |physics>
54.237|WP: physics> + 45.763|WP: particle physics>

sa: find-topic[words-2] |particle physics>
60.000|WP: particle physics> + 40.000|WP: physics>                -- so we have an exact phrase match of "particle physics"

sa: find-topic[words] (|particle> + |physics>)
51.605|WP: particle physics> + 48.395|WP: physics>                -- we have a match with a "softer" phrase match too, of course.

sa: find-topic[words] |electron>
62.791|WP: particle physics> + 37.209|WP: physics>


sa: find-topic[words-2] |bill clinton>
100.000|WP: US presidents>

sa: find-topic[words-2] |george bush>                             -- no match on the exact phrase.
|>                                                                -- probably because of the need to disambiguate between father and son.

sa: find-topic[words] (|george> + |bush>)
67.705|WP: US presidents> + 22.363|WP: Australia> + 9.932|WP: Adelaide> -- softer match still gives good results.

sa: find-topic[words-2] |richard nixon>
100.000|WP: US presidents>

sa: find-topic[words-2] |thomas jefferson>
100.000|WP: US presidents>

sa: find-topic[words] |reagan>
100.000|WP: US presidents>


sa: find-topic[words-2] |united states>                           -- heh. matched more than expected.
34.913|WP: rivers> + 24.938|WP: US presidents> + 13.965|WP: particle physics> + 13.092|WP: country list> + 13.092|WP: Australia>

sa: find-topic[words-2] |united kingdom>
56.000|WP: Australia> + 28.000|WP: country list> + 16.000|WP: US presidents>

sa: find-topic[words] |thailand>
66.667|WP: rivers> + 33.333|WP: country list>

sa: find-topic[words] |burma>
100.000|WP: country list>

sa: find-topic[words-2] |new zealand>                             -- I'm getting the impression the WP page on countries does not mention the country very often.
66.667|WP: Australia> + 16.667|WP: Adelaide> + 16.667|WP: country list> -- and hence the resulting low coeffs for that page.
                                                                  -- since smaller frequencies for a term, give smaller find-topic coeffs.
sa: find-topic[words] |japan>
53.598|WP: Australia> + 24.566|WP: particle physics> + 21.836|WP: country list>

sa: find-topic[words] |egypt>
66.667|WP: rivers> + 33.333|WP: country list>

sa: find-topic[words] |brazil>
85.714|WP: rivers> + 14.286|WP: country list>

-- now, an aside. The results for Egypt and Brazil are similar. ie, they are more known for having rivers, than as countries.
-- let's check this in the console:
sa: |t1> => find-topic[words] |egypt>
sa: |t2> => find-topic[words] |brazil>
sa: 100 ket-simm(""|t1>, "" |t2>)
80.952|simm>
-- so yeah. Simm backs up that thought. 81% similarity in terms of their find-topic result.

OK. Now, let's make fuller use of the superposition version of find-topic:

find-topic[words] (|australia> + |austria> + |brazil> + |chile> + |denmark> + |holland> + |germany> + |france> + |japan> + |italay> + |greece>)
39.901|WP: country list> + 24.711|WP: rivers> + 19.970|WP: Australia> + 7.646|WP: Adelaide> + 4.324|WP: particle physics> + 3.448|WP: physics>

-- and now again without the "italay" typo:
find-topic[words] (|australia> + |austria> + |brazil> + |chile> + |denmark> + |holland> + |germany> + |france> + |japan> + |italy> + |greece>)
38.242|WP: country list> + 24.433|WP: rivers> + 19.765|WP: Australia> + 10.494|WP: Adelaide> + 3.931|WP: particle physics> + 3.135|WP: physics>
-- heh. So didn't change much.


sa: find-topic[words] (|adelaide> + |perth> + |sydney> + |melbourne> + |brisbane> + |hobart> + |darwin> + |canberra>)
68.536|WP: Australia> + 27.623|WP: Adelaide> + 3.841|WP: US presidents>
-- heh. This is a nice example of signal perculating upwards as you add more terms.
-- Let's make that more obvious.
First, save me some effort by saving this as WP-post-processing.sw:
|t1> => find-topic[words] (|adelaide>)                                 -- NB: there is no |context> line here.
|t2> => find-topic[words] (|adelaide> + |perth>)                       -- so results get merged into current context.
|t3> => find-topic[words] (|adelaide> + |perth> + |sydney>)
|t4> => find-topic[words] (|adelaide> + |perth> + |sydney> + |melbourne>)
|t5> => find-topic[words] (|adelaide> + |perth> + |sydney> + |melbourne> + |brisbane>)
|t6> => find-topic[words] (|adelaide> + |perth> + |sydney> + |melbourne> + |brisbane> + |hobart>)
|t7> => find-topic[words] (|adelaide> + |perth> + |sydney> + |melbourne> + |brisbane> + |hobart> + |darwin>)
|t8> => find-topic[words] (|adelaide> + |perth> + |sydney> + |melbourne> + |brisbane> + |hobart> + |darwin> + |canberra>)
|list> => |t1> + |t2> + |t3> + |t4> + |t5> + |t6> + |t7> + |t8>
-- then load it up in the console:
sa: load WP-post-processing.sw
sa: dump "" |list>
 |t1> => 74.576|WP: Adelaide> + 25.424|WP: Australia>
 |t2> => 62.712|WP: Australia> + 37.288|WP: Adelaide>
 |t3> => 61.888|WP: Australia> + 38.112|WP: Adelaide>
 |t4> => 61.476|WP: Australia> + 38.524|WP: Adelaide>
 |t5> => 60.720|WP: Australia> + 39.280|WP: Adelaide>
 |t6> => 58.048|WP: Australia> + 36.831|WP: Adelaide> + 5.121|WP: US presidents>
 |t7> => 64.042|WP: Australia> + 31.569|WP: Adelaide> + 4.389|WP: US presidents>
 |t8> => 68.536|WP: Australia> + 27.623|WP: Adelaide> + 3.841|WP: US presidents> 
-- so it sort of works (note the changes in the coeff of |WP: Australia>), but not quite as good as I hoped.

-- Let's try with US presidents:
Again, save some effort by saving this as WP-post-processing-2.sw: 
|s1> => find-topic[words-2] (|thomas jefferson>)
|s2> => find-topic[words-2] (|thomas jefferson> + |ronald regan>)
|s3> => find-topic[words-2] (|thomas jefferson> + |ronald regan> + |richard nixon>)
|s4> => find-topic[words-2] (|thomas jefferson> + |ronald regan> + |richard nixon> + |bill clinton>)
|s5> => find-topic[words-2] (|thomas jefferson> + |ronald regan> + |richard nixon> + |bill clinton> + |barack obama>)
|s6> => find-topic[words-2] (|thomas jefferson> + |ronald regan> + |richard nixon> + |bill clinton> + |barack obama> + |george washington>)
|s7> => find-topic[words-2] (|thomas jefferson> + |ronald regan> + |richard nixon> + |bill clinton> + |barack obama> + |george washington> + |james monroe>)
|s8> => find-topic[words-2] (|thomas jefferson> + |ronald regan> + |richard nixon> + |bill clinton> + |barack obama> + |george washington> + |james monroe> + |jimmy carter>)
|list> => |s1> + |s2> + |s3> + |s4> + |s5> + |s6> + |s7> + |s8>
-- then load it up in the console:
sa: load WP-post-processing-2.sw
sa: dump "" |list>
 |s1> => 100.000|WP: US presidents>
 |s2> => 100.000|WP: US presidents>
 |s3> => 100.000|WP: US presidents>
 |s4> => 100.000|WP: US presidents>
 |s5> => 100.000|WP: US presidents>
 |s6> => 100.000|WP: US presidents>
 |s7> => 100.000|WP: US presidents>
 |s8> => 100.000|WP: US presidents>
-- hrmm... boring, but correct, result.

-- Let's try again. How about first names this time, something a little harder, and more interesting, I hope.
|s1> => find-topic[words] (|thomas>)
|s2> => find-topic[words] (|thomas> + |ronald>)
|s3> => find-topic[words] (|thomas> + |ronald> + |richard>)
|s4> => find-topic[words] (|thomas> + |ronald> + |richard> + |bill>)
|s5> => find-topic[words] (|thomas> + |ronald> + |richard> + |bill> + |barack>)
|s6> => find-topic[words] (|thomas> + |ronald> + |richard> + |bill> + |barack> + |george>)
|s7> => find-topic[words] (|thomas> + |ronald> + |richard> + |bill> + |barack> + |george> + |james>)
|s8> => find-topic[words] (|thomas> + |ronald> + |richard> + |bill> + |barack> + |george> + |james> + |jimmy>)
|list> => |s1> + |s2> + |s3> + |s4> + |s5> + |s6> + |s7> + |s8>

sa: load WP-post-processing-3.sw
sa: dump "" |list>
 |s1> => 63.953|WP: US presidents> + 23.256|WP: Australia> + 12.791|WP: Adelaide>
 |s2> => 81.977|WP: US presidents> + 11.628|WP: Australia> + 6.395|WP: Adelaide>
 |s3> => 82.856|WP: US presidents> + 12.880|WP: Australia> + 4.264|WP: Adelaide>
 |s4> => 87.142|WP: US presidents> + 9.660|WP: Australia> + 3.198|WP: Adelaide>
 |s5> => 89.714|WP: US presidents> + 7.728|WP: Australia> + 2.558|WP: Adelaide>
 |s6> => 85.108|WP: US presidents> + 9.450|WP: Australia> + 5.443|WP: Adelaide>
 |s7> => 79.819|WP: US presidents> + 12.097|WP: Australia> + 6.863|WP: Adelaide> + 1.221|WP: rivers>
 |s8> => 76.786|WP: US presidents> + 11.561|WP: Adelaide> + 10.585|WP: Australia> + 1.069|WP: rivers>
-- so it works pretty well. Though having a larger collection of wikipedia pages might weaken the effect.

24/6/2014 update: OK. I implemented a split function, to save typing.
An exmaple in the console:

sa: split |thomas ronald richard bill barack george james jimmy>
|thomas> + |ronald> + |richard> + |bill> + |barack> + |george> + |james> + |jimmy>

sa: find-topic[words] split |thomas ronald richard bill barack george james jimmy>
76.786|WP: US presidents> + 11.561|WP: Adelaide> + 10.585|WP: Australia> + 1.069|WP: rivers>

which as expected gives the same result as |s8> above.
BTW, maybe eventually it would be useful to have a more general split. cf python, you get to choose the split char and so on.
Currently it just uses the python string.split().

def split_ket(one):
  result = superposition()
  result.data = [ket(w) for w in one.the_label().split() ]
  return result

Eventually I would like to run this code on all of wikipedia, but I don't have the computing power, ATM.
Now a couple of points. First, here is the (partial) history as seen from the console:

sa: history 1000
  files
  load WP-word-frequencies.sw
  find-topic[words] |south australia>
  find-topic[words-2] |south australia>
  find-topic[words] |adelaide>
  find-topic[words] |sydney>
  find-topic[words] |wikipedia>
  find-topic[words] |canberra>
  find-topic[words] (|river> + |nile>)
  find-topic[words-2] |river nile>
  find-topic[words-2] |adelaide university>
  find-topic[words] (|adelaide> + |university>)
  find-topic[words-3] |university of adelaide>
  find-topic[words] |physics>
  find-topic[words-2] |particle physics>
  find-topic[words] (|particle> + |physics>)
  find-topic[words] |electron>
  find-topic[words-2] |bill clinton>
  find-topic[words] |george bush>
  find-topic[words-2] |george bush>
  find-topic[words] (|george> + |bush>)
  find-topic[words-2] |richard nixon>
  find-topic[words-2] |thomas jefferson>
  find-topic[words] |reagan>
  find-topic[words-2] |aami stadium>
  find-topic[words-2] |river torrens>
  find-topic[words-2] |rundle mall>
  find-topic[words-2] |united states>
  find-topic[words-2] |united kingdom>
  find-topic[words] |thailand>
  find-topic[words] |burma>
  find-topic[words] |new zealand>
  find-topic[words-2] |new zealand>
  find-topic[words] |japan>
  find-topic[words] |egypt>
  find-topic[words] |brazil>
  |t1> => find-topic[words] |egypt>
  |t2> => find-topic[words] |brazil>
  ket-simm(""|t1>, "" |t2>)
  history 100
  100 ket-simm(""|t1>, "" |t2>)
  matrix[words]
  history 1000

Second, here are the WP word to frequency lists as matrices:
single word frequencies
2-gram word frequencies
3-gram word frequencies

21/8/2014 update: Above I have been specifying the ngram size. "words" for 1-grams, "words-2" for-2 grams, and "words-3" for 3-grams.
Well, we can fix that by using this general rule:

find |*> #=> find-topic[words] |_self> + find-topic[words-2] |_self> + find-topic[words-3] |_self>

Anyway, something close to that.
It works because only 1 of the three find-topics will match, the other two will return the identity superposition |>, and hence have no effect on the result.
BTW, haven't tested it in the console, but I think it is correct.
BTW, originally I was planning on merging the words, words-2 and words-3 into the one merged frequency list, but I think that would give the wrong answer some of the time because the frequency of the current word might be a different n-gram size than the ket with the largest frequency.

OK. Did some quick testing, and it half works. From above we had:

sa: find-topic[words] |physics>
54.237|WP: physics> + 45.763|WP: particle physics>

sa: find-topic[words-2] |particle physics>
60.000|WP: particle physics> + 40.000|WP: physics>                

sa: find-topic[words] (|particle> + |physics>)
51.605|WP: particle physics> + 48.395|WP: physics>

Here we have:

sa: find |physics>
54.237|WP: physics> + 45.763|WP: particle physics>

sa: find |particle physics>
60.000|WP: particle physics> + 40.000|WP: physics>

sa: find (|particle> + |physics>)
103.210|WP: particle physics> + 96.790|WP: physics>

So find applied to a superposition is broken. Not sure why, yet.
Doh! I tried this fix, but it didn't work!

normed-find |*> #=> normalize[100] (find-topic[words] |_self> + find-topic[words-2] |_self> + find-topic[words-3] |_self>)

So the bug is in using |*> rules on a superposition. I'll try and think of a fix later.

17/7/2014 update: OK. Now let's have a brief look at the compression ratio of mapping WP pages to frequency lists.
count returns the number of kets in a superposition, and count-sum adds up the coeffs of the kets in the superposition (ie, in this case the number of words in the original document).
So we just need 100 count / count-sum. Like this:

-- First try is simply:
sa: compress-ratio |*> #=> arithmetic(count words |_self>, |/>,count-sum words |_self>)
-- test an example:
sa: compress-ratio |WP: Adelaide>
|number: 0.25677603423680456>

-- tweak it:
sa: compress-ratio-100 |*> #=> 100 to-number arithmetic(count words |_self>, |/>,count-sum words |_self>)
sa: compress-ratio-100 |WP: Adelaide>
25.678| >

-- now, let's map compress-ratio-100 to all our WP pages, and show the result:
sa: map[compress-ratio-100,cr-100] relevant-kets[words]       -- NB: relevant-kets is useful. Means we don't have to manually specify the list of interest.
sa: matrix[cr-100]
[   ] = [  25.68  29.43  30.15  35.40  28.81  35.99  33.33  ] [ WP: Adelaide         ]
                                                              [ WP: Australia        ]
                                                              [ WP: country list     ]
                                                              [ WP: particle physics ]
                                                              [ WP: physics          ]
                                                              [ WP: rivers           ]
                                                              [ WP: US presidents    ]

-- now the words-2 data:
sa: w2-compress-ratio-100 |*> #=> 100 to-number arithmetic(count words-2 |_self>, |/>,count-sum words-2 |_self>)
sa: map[w2-compress-ratio-100,cr-100-2] relevant-kets[words-2]
sa: matrix[cr-100-2]
[   ] = [  75.00  77.49  65.21  80.15  77.02  82.07  71.26  ] [ WP: Adelaide         ]
                                                              [ WP: Australia        ]
                                                              [ WP: country list     ]
                                                              [ WP: particle physics ]
                                                              [ WP: physics          ]
                                                              [ WP: rivers           ]
                                                              [ WP: US presidents    ]

-- now the words-3 data:
sa: w3-compress-ratio-100 |*> #=> 100 to-number arithmetic(count words-3 |_self>, |/>,count-sum words-3 |_self>)
sa: map[w3-compress-ratio-100,cr-100-3] relevant-kets[words-3]
sa: matrix[cr-100-3]
[   ] = [  92.96  94.59  81.56  94.18  94.66  93.78  87.30  ] [ WP: Adelaide         ]
                                                              [ WP: Australia        ]
                                                              [ WP: country list     ]
                                                              [ WP: particle physics ]
                                                              [ WP: physics          ]
                                                              [ WP: rivers           ]
                                                              [ WP: US presidents    ]

Summary, I guess. As the ngrams get bigger the compression ratio approaches no compression (which is an expected result).

30/6/2014: Put here instead of the bottom because I want the project description to stay at the bottom of the page.
Anyway, implemented a couple of things in the last day or two.
One is a pretty print of sw data, the other is converting a context to a frequency list.
The idea for the second one is map all our sw files to frequency lists, then when we want to look up about George say, we use our find-topic/nfc code to decide which is the most relevant sw files.
But for now, some pretty print examples (and compare/contrast them with their sw equivalents):

sa: load early-us-presidents.sw
sa: display
  context: early US Presidents

  early US Presidents: _list
  supported-ops: op:
               : Washington, Adams, Jefferson, Madison, Monroe, Q Adams

  Washington
     supported-ops: op: president-number, op: president-era, op: party, op: full-name
  president-number: number: 1
     president-era: year: 1789, year: 1790, year: 1791, year: 1792, year: 1793, year: 1794, year: 1795, year: 1796, year: 1797
             party: party: Independent
         full-name: person: George Washington

  person: George Washington
  supported-ops: op:
               : US President: George Washington

  Adams
     supported-ops: op: president-number, op: president-era, op: party, op: full-name
  president-number: number: 2
     president-era: year: 1797, year: 1798, year: 1799, year: 1800, year: 1801
             party: party: Federalist
         full-name: person: John Adams

  person: John Adams
  supported-ops: op:
               : US President: John Adams

  Jefferson
     supported-ops: op: president-number, op: president-era, op: party, op: full-name
  president-number: number: 3
     president-era: year: 1801, year: 1802, year: 1803, year: 1804, year: 1805, year: 1806, year: 1807, year: 1808, year: 1809
             party: party: Democratic-Republican
         full-name: person: Thomas Jefferson

  person: Thomas Jefferson
  supported-ops: op:
               : US President: Thomas Jefferson

  Madison
     supported-ops: op: president-number, op: president-era, op: party, op: full-name
  president-number: number: 4
     president-era: year: 1809, year: 1810, year: 1811, year: 1812, year: 1813, year: 1814, year: 1815, year: 1816, year: 1817
             party: party: Democratic-Republican
         full-name: person: James Madison

  person: James Madison
  supported-ops: op:
               : US President: James Madison

  Monroe
     supported-ops: op: president-number, op: president-era, op: party, op: full-name
  president-number: number: 5
     president-era: year: 1817, year: 1818, year: 1819, year: 1820, year: 1821, year: 1822, year: 1823, year: 1824, year: 1825
             party: party: Democratic-Republican
         full-name: person: James Monroe

  person: James Monroe
  supported-ops: op:
               : US President: James Monroe

  Q Adams
     supported-ops: op: president-number, op: president-era, op: party, op: full-name
  president-number: number: 6
     president-era: year: 1825, year: 1826, year: 1827, year: 1828, year: 1829
             party: party: Democratic-Republican
         full-name: person: John Quincy Adams

  person: John Quincy Adams
  supported-ops: op:
               : US President: John Quincy Adams

  party: Democratic-Republican
  supported-ops: op: founded, op: dissolved
        founded: year: 1791
      dissolved: year: 1825

And another example:

sa: load bots.sw
sa: display
  context: bot profile

  bot: Bella
           supported-ops: op: name, op: mother, op: father, op: birth-sign, op: number-siblings, op: wine-preference, op: favourite-fruit, op: favourite-music, op: favourite-play, op: hair-colour, op: eye-colour, op: where-live, op: favourite-holiday-spot, op: make-of-car, op: religion, op: personality-type, op: current-emotion, op: bed-time, op: age
                    name: Bella
                  mother: Mia
                  father: William
              birth-sign: birth-sign: Cancer
         number-siblings: number: 1
         wine-preference: wine: Merlot
         favourite-fruit: fruit: pineapples
         favourite-music: music: genre: punk
          favourite-play: play: Endgame
             hair-colour: hair-colour: gray
              eye-colour: eye-colour: hazel
              where-live: location: Sydney
  favourite-holiday-spot: location: Paris
             make-of-car: car: Porsche
                religion: religion: Christianity
        personality-type: personality-type: the guardian
         current-emotion: emotion: fear
                bed-time: time: 8pm
                     age: age: 31

  bot: Emma
           supported-ops: op: name, op: mother, op: father, op: birth-sign, op: number-siblings, op: wine-preference, op: favourite-fruit, op: favourite-music, op: favourite-play, op: hair-colour, op: eye-colour, op: where-live, op: favourite-holiday-spot, op: make-of-car, op: religion, op: personality-type, op: current-emotion, op: bed-time, op: age
                    name: Emma
                  mother: Madison
                  father: Nathan
              birth-sign: birth-sign: Capricorn
         number-siblings: number: 4
         wine-preference: wine: Pinot Noir
         favourite-fruit: fruit: oranges
         favourite-music: music: genre: hip hop
          favourite-play: play: No Exit
             hair-colour: hair-colour: red
              eye-colour: eye-colour: gray
              where-live: location: New York
  favourite-holiday-spot: location: Taj Mahal
             make-of-car: car: BMW
                religion: religion: Taoism
        personality-type: personality-type: the visionary
         current-emotion: emotion: kindness
                bed-time: time: 2am
                     age: age: 29

  bot: Madison
           supported-ops: op: name, op: mother, op: father, op: birth-sign, op: number-siblings, op: wine-preference, op: favourite-fruit, op: favourite-music, op: favourite-play, op: hair-colour, op: eye-colour, op: where-live, op: favourite-holiday-spot, op: make-of-car, op: religion, op: personality-type, op: current-emotion, op: bed-time, op: hungry, op: age, op: friends
                    name: Madison
                  mother: Mia
                  father: Ian
              birth-sign: birth-sign: Cancer
         number-siblings: number: 6
         wine-preference: wine: Pinot Noir
         favourite-fruit: fruit: pineapples
         favourite-music: music: genre: blues
          favourite-play: play: Death of a Salesman
             hair-colour: hair-colour: red
              eye-colour: eye-colour: amber
              where-live: location: Vancouver
  favourite-holiday-spot: location: Uluru
             make-of-car: car: Bugatti
                religion: religion: Islam
        personality-type: personality-type: the performer
         current-emotion: emotion: indignation
                bed-time: time: 10:30pm
                  hungry: starving
                     age: age: 23
                 friends: bot: Emma, bot: Bella

And another example:

sa: load george.sw
sa: display
  context: George

  context: George
  supported-ops: op: source
         source: sw-url: http://semantic-db.org/george.sw

  word: george
  supported-ops: op: spell, op:
          spell: 2.00 letter: g, 2.00 letter: e, letter: o, letter: r
               : person: George

  person: George
       supported-ops: op: age, op: dob, op: hair-colour, op: eye-colour, op: gender, op: height, op: wife, op: occupation, op: friends, op: mother, op: father, op: sisters, op: brothers, op: siblings, op: parents, op: family, op: family-and-friends, op: email, op: education, op: can-swim
                 age: age: 29
                 dob: date: 1984-05-23
         hair-colour: hair-colour: brown
          eye-colour: eye-colour: blue
              gender: gender: male
              height: height: cm: 176
                wife: person: Beth
          occupation: occupation: car salesman
             friends: person: Fred, person: Jane, person: Liz, person: Andrew
              mother: person: Sarah
              father: person: David
             sisters: person: Emily
            brothers: person: Frank, person: Tim, person: Sam
            siblings: person: Frank, person: Tim, person: Sam, person: Emily
             parents: person: Sarah, person: David
              family: person: Sarah, person: David, person: Frank, person: Tim, person: Sam, person: Emily
  family-and-friends: person: Sarah, person: David, person: Frank, person: Tim, person: Sam, person: Emily, person: Fred, person: Jane, person: Liz, person: Andrew
               email: email: george.douglas@gmail.com
           education: education: high-school
            can-swim: 0.70 yes

  person: David Douglas
  supported-ops: op: is-dead
        is-dead: yes

And another example:

sa: load breakfast-menu.sw
sa: display
  context: breakfast menu

  menu: breakfast
  supported-ops: op:
               : food: Belgian Waffles, food: Strawberry Belgian Waffles, food: Berry-Berry Belgian Waffles, food: French Toast, food: Homestyle Breakfast

  food: Belgian Waffles
  supported-ops: op: name, op: price, op: description, op: calories
           name: text: "Belgian Waffles"
          price: price: 5.95
    description: text: "Two of our famous Belgian Waffles with plenty of real maple syrup"
       calories: calories: 650

  food: Strawberry Belgian Waffles
  supported-ops: op: name, op: price, op: description, op: calories
           name: text: "Strawberry Belgian Waffles"
          price: price: 7.95
    description: text: "Light Belgian waffles covered with strawberries and whipped cream"
       calories: calories: 900

  food: Berry-Berry Belgian Waffles
  supported-ops: op: name, op: price, op: description, op: calories
           name: text: "Berry-Berry Belgian Waffles"
          price: price: 8.95
    description: text: "Light Belgian waffles covered with an assortment of fresh berries and whipped cream"
       calories: calories: 900

  food: French Toast
  supported-ops: op: name, op: price, op: description, op: calories
           name: text: "French Toast"
          price: price: 4.50
    description: text: "Thick slices made from our homemade sourdough bread"
       calories: calories: 600

  food: Homestyle Breakfast
  supported-ops: op: name, op: price, op: description, op: calories
           name: text: "Homestyle Breakfast"
          price: price: 6.95
    description: text: "Two eggs, bacon or sausage, toast, and our ever-popular hash browns"
       calories: calories: 950

  word: waffles
  supported-ops: op:
               : food: waffles

  word: belgian
  supported-ops: op:
               : country: Belgium

  word: strawberries
  supported-ops: op:
               : food: strawberries, fruit: strawberries

  word: berries
  supported-ops: op:
               : food: berries, fruit: berries

  word: french
  supported-ops: op:
               : country: France

  word: toast
  supported-ops: op:
               : food: toast

  word: breakfast
  supported-ops: op:
               : meal: breakfast

  word: egg
  supported-ops: op:
               : food: egg

  word: eggs
  supported-ops: op:
               : food: egg

  word: bacon
  supported-ops: op:
               : food: bacon

  word: sausage
  supported-ops: op:
               : food: sausage

  word: two
  supported-ops: op:
               : number: 2

  word: cream
  supported-ops: op:
               : food: cream

And another example:

sa: load binary-tree.sw
sa: display
  context: binary tree

  x
  supported-ops: op: text, op: left, op: right
           text: start node
           left: 0
          right: 1

  0
  supported-ops: op: text, op: left, op: right
           text: first child node
           left: 00
          right: 10

  1
  supported-ops: op: text, op: left, op: right
           text: second child node
           left: 01
          right: 11

  00
  supported-ops: op: text, op: left, op: right
           text: third child node
           left: 000
          right: 100

  10
  supported-ops: op: text, op: left, op: right
           text: fourth child node
           left: 010
          right: 110

  01
  supported-ops: op: text, op: left, op: right
           text: fifth child node
           left: 001
          right: 101

  11
  supported-ops: op: text, op: left, op: right
           text: sixth child node
           left: 011
          right: 111

And of course, you don't have to display the entire context. You can choose your ket/sp of interest.

sa: diplay |bot: Madison>
or
sa: display (|bot: Madison> + |bot: Emma>)

BTW, the code for all this is (in the code file):

  def display_ket(self,one):     # one is a ket
    label = one.the_label() if type(one) == ket else one
    head = "  " + label + "\n"
    op_list = self.rule_list[label]
    if len(op_list) == 0:
      return head      
    max_len = max(len(op) for op in op_list)
    sep = ": "
    frame = "\n".join("  " + op.rjust(max_len) + sep + self.recall(op,label).readable_display() for op in op_list)
    return head + frame + "\n"             

  def display_sp(self,sp):                                                                               
    if type(sp) == ket:
      return self.display_ket(sp)
    if type(sp) == superposition:
      return "\n".join(self.display_ket(x) for x in sp.data)

  def display_all(self):
    head = "  context: " + self.name + "\n\n"
    return head + "\n".join(self.display_ket(x) for x in self.known_kets)

30/6/2014: Now, some context to frequency list examples:
BTW, the idea is to map all our sw example files to this type of frequency list.
Then when you have an object and want to work out the best sw file to look at, use the find-topic/nfc code.

sa: load early-us-presidents.sw
sa: freq
7.000|op: > + 6.000|op: president-number> + 6.000|op: president-era> + 6.000|op: party> + 6.000|op: full-name> + 6.000|party: Democratic-Republican> + 5.000|Washington> + 5.000|Adams> + 5.000|Jefferson> + 5.000|Madison> + 5.000|Monroe> + 5.000|Q Adams> + 3.000|year: 1825> + 2.000|year: 1791> + 2.000|year: 1797> + 2.000|person: George Washington> + 2.000|year: 1801> + 2.000|person: John Adams> + 2.000|year: 1809> + 2.000|person: Thomas Jefferson> + 2.000|year: 1817> + 2.000|person: James Madison> + 2.000|person: James Monroe> + 2.000|person: John Quincy Adams> + |early US Presidents: _list> + |number: 1> + |year: 1789> + |year: 1790> + |year: 1792> + |year: 1793> + |year: 1794> + |year: 1795> + |year: 1796> + |party: Independent> + |US President: George Washington> + |number: 2> + |year: 1798> + |year: 1799> + |year: 1800> + |party: Federalist> + |US President: John Adams> + |number: 3> + |year: 1802> + |year: 1803> + |year: 1804> + |year: 1805> + |year: 1806> + |year: 1807> + |year: 1808> + |US President: Thomas Jefferson> + |number: 4> + |year: 1810> + |year: 1811> + |year: 1812> + |year: 1813> + |year: 1814> + |year: 1815> + |year: 1816> + |US President: James Madison> + |number: 5> + |year: 1818> + |year: 1819> + |year: 1820> + |year: 1821> + |year: 1822> + |year: 1823> + |year: 1824> + |US President: James Monroe> + |number: 6> + |year: 1826> + |year: 1827> + |year: 1828> + |year: 1829> + |US President: John Quincy Adams> + |op: founded> + |op: dissolved>

sa: load bots.sw
sa: freq
21.000|bot: Madison> + 20.000|bot: Bella> + 20.000|bot: Emma> + 3.000|op: name> + 3.000|op: mother> + 3.000|op: father> + 3.000|op: birth-sign> + 3.000|op: number-siblings> + 3.000|op: wine-preference> + 3.000|op: favourite-fruit> + 3.000|op: favourite-music> + 3.000|op: favourite-play> + 3.000|op: hair-colour> + 3.000|op: eye-colour> + 3.000|op: where-live> + 3.000|op: favourite-holiday-spot> + 3.000|op: make-of-car> + 3.000|op: religion> + 3.000|op: personality-type> + 3.000|op: current-emotion> + 3.000|op: bed-time> + 3.000|op: age> + 2.000|Mia> + 2.000|birth-sign: Cancer> + 2.000|fruit: pineapples> + 2.000|Madison> + 2.000|wine: Pinot Noir> + 2.000|hair-colour: red> + |Bella> + |William> + |number: 1> + |wine: Merlot> + |music: genre: punk> + |play: Endgame> + |hair-colour: gray> + |eye-colour: hazel> + |location: Sydney> + |location: Paris> + |car: Porsche> + |religion: Christianity> + |personality-type: the guardian> + |emotion: fear> + |time: 8pm> + |age: 31> + |Emma> + |Nathan> + |birth-sign: Capricorn> + |number: 4> + |fruit: oranges> + |music: genre: hip hop> + |play: No Exit> + |eye-colour: gray> + |location: New York> + |location: Taj Mahal> + |car: BMW> + |religion: Taoism> + |personality-type: the visionary> + |emotion: kindness> + |time: 2am> + |age: 29> + |op: hungry> + |op: friends> + |Ian> + |number: 6> + |music: genre: blues> + |play: Death of a Salesman> + |eye-colour: amber> + |location: Vancouver> + |location: Uluru> + |car: Bugatti> + |religion: Islam> + |personality-type: the performer> + |emotion: indignation> + |time: 10:30pm> + |starving> + |age: 23>

sa: load george.sw
sa: freq
21.000|person: George> + 4.000|person: Sarah> + 4.000|person: David> + 4.000|person: Emily> + 4.000|person: Frank> + 4.000|person: Tim> + 4.000|person: Sam> + 2.000|letter: g> + 2.000|letter: e> + 2.000|word: george> + 2.000|person: Fred> + 2.000|person: Jane> + 2.000|person: Liz> + 2.000|person: Andrew> + 1.700|yes> + |op: source> + |sw-url: http://semantic-db.org/george.sw> + |context: George> + |op: spell> + |op: > + |letter: o> + |letter: r> + |op: age> + |op: dob> + |op: hair-colour> + |op: eye-colour> + |op: gender> + |op: height> + |op: wife> + |op: occupation> + |op: friends> + |op: mother> + |op: father> + |op: sisters> + |op: brothers> + |op: siblings> + |op: parents> + |op: family> + |op: family-and-friends> + |op: email> + |op: education> + |op: can-swim> + |age: 29> + |date: 1984-05-23> + |hair-colour: brown> + |eye-colour: blue> + |gender: male> + |height: cm: 176> + |person: Beth> + |occupation: car salesman> + |email: george.douglas@gmail.com> + |education: high-school> + |op: is-dead> + |person: David Douglas>

sa: load breakfast-menu.sw
sa: freq
14.000|op: > + 5.000|food: Belgian Waffles> + 5.000|food: Strawberry Belgian Waffles> + 5.000|food: Berry-Berry Belgian Waffles> + 5.000|food: French Toast> + 5.000|food: Homestyle Breakfast> + 5.000|op: name> + 5.000|op: price> + 5.000|op: description> + 5.000|op: calories> + 2.000|calories: 900> + 2.000|food: egg> + |menu: breakfast> + |text: "Belgian Waffles"> + |price: 5.95> + |text: "Two of our famous Belgian Waffles with plenty of real maple syrup"> + |calories: 650> + |text: "Strawberry Belgian Waffles"> + |price: 7.95> + |text: "Light Belgian waffles covered with strawberries and whipped cream"> + |text: "Berry-Berry Belgian Waffles"> + |price: 8.95> + |text: "Light Belgian waffles covered with an assortment of fresh berries and whipped cream"> + |text: "French Toast"> + |price: 4.50> + |text: "Thick slices made from our homemade sourdough bread"> + |calories: 600> + |text: "Homestyle Breakfast"> + |price: 6.95> + |text: "Two eggs, bacon or sausage, toast, and our ever-popular hash browns"> + |calories: 950> + |food: waffles> + |word: waffles> + |country: Belgium> + |word: belgian> + |food: strawberries> + |fruit: strawberries> + |word: strawberries> + |food: berries> + |fruit: berries> + |word: berries> + |country: France> + |word: french> + |food: toast> + |word: toast> + |meal: breakfast> + |word: breakfast> + |word: egg> + |word: eggs> + |food: bacon> + |word: bacon> + |food: sausage> + |word: sausage> + |number: 2> + |word: two> + |food: cream> + |word: cream>

sa: load binary-tree.sw
sa: freq
7.000|op: text> + 7.000|op: left> + 7.000|op: right> + 4.000|0> + 4.000|1> + 4.000|00> + 4.000|10> + 4.000|01> + 4.000|11> + 3.000|x> + |start node> + |first child node> + |second child node> + |third child node> + |000> + |100> + |fourth child node> + |010> + |110> + |fifth child node> + |001> + |101> + |sixth child node> + |011> + |111>

And the code for all this is (in the code file):

  def to_freq_list(self):
    result = superposition()
    for x in self.known_kets:
      op_list = self.rule_list[x]
      count_x = len(op_list) - 1                               # we subtract 1 because we don't want to count the supported-ops term.
      for op in op_list:
        rule = self.recall(op,x)
        if type(rule) == ket or type(rule) == superposition:   # we currently want to ignore stored_rules.
          result += rule.apply_sigmoid(clean)                  # we don't care about the coeffs (hence the clean), just if ket is present or not.
      result += ket(x,count_x)
    return result.coeff_sort()

1/7/2014: OK. I now have the code to map sw files to frequency lists.
Pretty sure I don't have enough computing resources to use it though (at least on all my sw files).
The other problem is it runs into the terrible big-O for frequency lists bug/annoyance. Currently trying to work out the best hack to get around that.
Anyway, the plan is to do things like:

sa: find-topic[kets] |person: Thomas Jefferson>
sa: find-topic[kets] |person: George>
sa: find-topic[kets] |bot: Emma>

And then it will reply with the best sw file. At least that is the plan.

WOOT!!! OK. I didn't have the resources to run the full thing, so I limited it to files smaller than 10K.
Works great! I have a result. Heh. I was half expecting I made a mistake and it wouldn't work.

-- decided not to put the result in the standard sw file directory. Else the code would "eat its own tail", by running itself on its own result.
-- so we need to change to the appropriate directory:
sa: cd sw-frequency-list

-- load the file:
sa: load sw-files-to-frequency-lists.sw

-- which file is best for Thomas Jefferson?
sa: find-topic[kets] |person: Thomas Jefferson>
50.000|sw file: breaky-presidents.sw> + 50.000|sw file: early-us-presidents.sw>

-- which file is best for George?
sa: find-topic[kets] |person: George>
60.000|sw file: george.sw> + 40.000|sw file: recall-general-rules-example.sw>

-- let's look at their frequency lists (using dump):
sa: dump find-topic[kets] |person: George>
kets |sw file: george.sw> => 21.000|person: George> + 4.000|person: Sarah> + 4.000|person: David> + 4.000|person: Emily> + 4.000|person: Frank> + 4.000|person: Tim> + 4.000|person: Sam> + 2.000|word: george> + 2.000|person: Fred> + 2.000|person: Jane> + 2.000|person: Liz> + 2.000|person: Andrew> + 2.000|yes> + |op: source> + |sw-url: http://semantic-db.org/george.sw> + |context: George> + |op: spell> + |op: > + |letter: g> + |letter: e> + |letter: o> + |letter: r> + |op: age> + |op: dob> + |op: hair-colour> + |op: eye-colour> + |op: gender> + |op: height> + |op: wife> + |op: occupation> + |op: friends> + |op: mother> + |op: father> + |op: sisters> + |op: brothers> + |op: siblings> + |op: parents> + |op: family> + |op: family-and-friends> + |op: email> + |op: education> + |op: can-swim> + |age: 29> + |date: 1984-05-23> + |hair-colour: brown> + |eye-colour: blue> + |gender: male> + |height: cm: 176> + |person: Beth> + |occupation: car salesman> + |email: george.douglas@gmail.com> + |education: high-school> + |op: is-dead> + |person: David Douglas>
kets |sw file: recall-general-rules-example.sw> => 4.000|person: *> + 2.000|op: bro> + 2.000|op: sis> + 2.000|person: George> + 2.000|person: Zack> + |person: Fred> + |person: Harry> + |person: Mary> + |op: my-id> + |*> + |op: sibs> + |op: is-human> + |op: brothers> + |op: sisters> + |bro 3> + |bro 4> + |bro 5> + |sis 1>

-- which file is best for the Emma bot?
sa: find-topic[kets] |bot: Emma>
50.000|sw file: bot-emma.sw> + 50.000|sw file: bots.sw>

-- let's look at these two files frequency lists (this time using display):
sa: display find-topic[kets] |bot: Emma>
  sw file: bot-emma.sw
  supported-ops: op: kets
           kets: 18.00 bot: Emma, op: name, op: mother, op: father, op: birth-sign, op: number-siblings, op: wine-preference, op: favourite-fruit, op: favourite-music, op: favourite-play, op: hair-colour, op: eye-colour, op: where-live, op: favourite-holiday-spot, op: make-of-car, op: religion, op: personality-type, op: current-emotion, op: bed-time, Emma, Madison, Nathan, birth-sign: Capricorn, number: 4, wine: Pinot Noir, fruit: oranges, music: genre: hip hop, play: No Exit, hair-colour: red, eye-colour: gray, location: New York, location: Taj Mahal, car: BMW, religion: Taoism, personality-type: the visionary, emotion: kindness, time: 2am

  sw file: bots.sw
  supported-ops: op: kets
           kets: 21.00 bot: Madison, 20.00 bot: Bella, 20.00 bot: Emma, 3.00 op: name, 3.00 op: mother, 3.00 op: father, 3.00 op: birth-sign, 3.00 op: number-siblings, 3.00 op: wine-preference, 3.00 op: favourite-fruit, 3.00 op: favourite-music, 3.00 op: favourite-play, 3.00 op: hair-colour, 3.00 op: eye-colour, 3.00 op: where-live, 3.00 op: favourite-holiday-spot, 3.00 op: make-of-car, 3.00 op: religion, 3.00 op: personality-type, 3.00 op: current-emotion, 3.00 op: bed-time, 3.00 op: age, 2.00 Mia, 2.00 birth-sign: Cancer, 2.00 fruit: pineapples, 2.00 Madison, 2.00 wine: Pinot Noir, 2.00 hair-colour: red, Bella, William, number: 1, wine: Merlot, music: genre: punk, play: Endgame, hair-colour: gray, eye-colour: hazel, location: Sydney, location: Paris, car: Porsche, religion: Christianity, personality-type: the guardian, emotion: fear, time: 8pm, age: 31, Emma, Nathan, birth-sign: Capricorn, number: 4, fruit: oranges, music: genre: hip hop, play: No Exit, eye-colour: gray, location: New York, location: Taj Mahal, car: BMW, religion: Taoism, personality-type: the visionary, emotion: kindness, time: 2am, age: 29, op: hungry, op: friends, Ian, number: 6, music: genre: blues, play: Death of a Salesman, eye-colour: amber, location: Vancouver, location: Uluru, car: Bugatti, religion: Islam, personality-type: the performer, emotion: indignation, time: 10:30pm, starving, age: 23

-- look for files that make use of the "friends" operator:
sa: find-topic[kets] |op: friends>
23.622|sw file: fred-sam-friends.sw> + 23.622|sw file: hello-friends.sw> + 23.622|sw file: matrix-as-network.sw> + 11.811|sw file: random-greetings.sw> + 7.874|sw file: friends.sw> + 4.724|sw file: bots.sw> + 4.724|sw file: george.sw>

-- look for files that make use of the "fib" operator:
sa: find-topic[kets] |op: fib>
25.000|sw file: active-fib-play.sw> + 25.000|sw file: fib-play.sw> + 25.000|sw file: next-fib-play.sw> + 25.000|sw file: small-fib.sw>

-- look for files about frogs:
sa: find-topic[kets] |animal: frog>
100.000|sw file: frog.sw>

-- look for files about Fred:
sa: find-topic[kets] |Fred>
37.500|sw file: fred-sam-friends.sw> + 37.500|sw file: simple-movie-recommendation-example.sw> + 25.000|sw file: hello-friends.sw>

Very cool. Works nicely.
And perhaps suggests you don't need a page-rank algo to find the best matching document for a query.
Hrmm... maybe we could use this to find relevant code files, instead of grep?

17/7/2014: OK. I have been giving some thought to how to apply my ideas to image recognition.
Arrogantly enough, I think I have some of the needed pieces.
My current idea is maybe this sequence of processing:

-- align and rescale the image, if needed.
translate the image (in animals the direction of the eye does this step)
rotate the image
scale image to bigger or smaller.
-- then process it. For a start:
1) unsmooth. ie: f[k] => - f[k-1]/2 + f[k] - f[k+1]/2 (applied once)
2) drop-below[t]
3) Guassian smooth. ie: f[k] => f[k-1]/4 + f[k]/2 + f[k+1]/4 (applied maybe 300 times?)
-- then some more steps I haven't yet worked out. 

where:
(1) highlights edges.
(2) filters out slowly changing gradients
(3) blurs the edges, so matching is less strict on exact alignment of pixels

Anyway, that is the gist of the idea. Need to write/run code to see how well it works, and what we need to do next.
BTW, the above given unsmooth, and smooth are the 1D version. For an image we would need the 2D version (I have a couple of ideas how that would look)

4/8/2014: I have a new idea for working towards image recognition.
The idea is to use the categorize code.
First partition an image into sets using the categorize code on the values of pixels. The t parameter adjusts how rapidly a gradient can change and be considered in the same set.
Then on each of these subsets of the image, run categorize code again, this time based on location of pixels.
So for example, all text will be in the first partition, since they share a text colour.
Then the location categorize will single out individual letters, or, if you loosen t a bit, individual words.
Anyway, something like that.

30/7/2014: I have an idea for how the brain detects humour.
The general idea is that during the telling of a joke a concept/idea is reinforced, getting stronger and stronger.
Then the punch line is rapidly swapping to another interpretation of what has just been said, and the original concept drops to zero.
(the faster and stronger the swap, the better the joke. eg compare a simple pun with a witty joke)
So, it might look like this (with respect to time):

concept 1: 0, 1, 1, 2, 5, 10, 20, 50, 60, 70, 80, 85, 90, 100, 100, 100,   0,   0,   0,   0,   0,  0,  0
concept 2: 0, 0, 0, 0, 0,  0,  0,  0,  0,  0,  0,  0,  0,   0,   0,   0, 100, 100, 100, 100, 100, 80, 50     -- the joke eventually fades

So, the brain has machinery that looks out for humourous events.
Indeed, it is much stronger in children (they laugh much harder and more often than adults), so I think it likely humour is involved in learning from the incongruities in life.

Now something more speculative: maybe there is something to the whole "hahahahaha".
Maybe that is indicative of some kind of oscillation due to the joke?
Maybe the brain is swapping back and forth between concept 1 and 2, kind of reliving the joke each oscillation?
Where each "ha" corresponds to each (small) reliving of the joke?

21/8/2014 update: Briefly, the wow effect, swearing, and euphemisms:
The wow effect. When you say wow, it usually means a ket/concept has suddenly occurred (yeah kind of trivial observation)

some concept: 0, 0, 0, 0, 0, 0, 500, 500, 500, 450, 400, 400, 350, 200, 200, 100, 50, 0, 0

Swearing is somewhat similar to the wow effect.
Euphemisms on the other hand are an alternate way of representing a superposition, but with lower coefficients. Same content, just cognitively less jarring.
And cliche's have higher frequency coeff than non cliche's.
"Something is up/what is up?" corresponds to a coeff noticeably increasing
"hopes are fading" just means the coeff of the |hope> ket is decreasing

30/7/2014: Made good progress in tidying up one component of the parser.
The parse_rule_line(C,s) function is now vastly cleaner, and more powerful.
What motivated the clean up was I wanted to implement a new feature. I wanted indirect learn rules.
Previously, you had to learn with a direct ket. eg:

age |Fred> => |age: 23>

Fine, but how about indirect? The motivating example being:

sa: |you> => |Fred>                      -- you currently means Fred
sa: age "" |you> => |age: 23>            -- learn "your" age
sa: |you> => |Sam>                       -- you currently means Sam
sa: age "" |you> => |age: 21>            -- learn "your" age

sa: dump
----------------------------------------
|context> => |context: sw console>

 |you> => |Sam>
age |Fred> => |age: 23>
age |Sam> => |age: 21>
----------------------------------------

The idea being to try and replicate what happens in a conversation with someone where you have place holders for people you are talking about.
You, he, she, they, and so on.

Now, an interesting side-effect is that in the process we ratcheted up the power of BKO.
Now we can do things like:

sa: load fred-sam-friends.sw
sa: age friends (|Fred> + |Sam>) => |age: 31>  -- ie, all of Fred's and Sam's friends are 31 years old.
sa: dump                                       -- so let's learn this all in one go.
----------------------------------------
|context> => |context: friends>

friends |Fred> => |Jack> + |Harry> + |Ed> + |Mary> + |Rob> + |Patrick> + |Emma> + |Charlie>
friends |Sam> => |Charlie> + |George> + |Emma> + |Jack> + |Rober> + |Frank> + |Julie>
age |Jack> => |age: 31>
age |Harry> => |age: 31>
age |Ed> => |age: 31>
age |Mary> => |age: 31>
age |Rob> => |age: 31>
age |Patrick> => |age: 31>
age |Emma> => |age: 31>
age |Charlie> => |age: 31>
age |George> => |age: 31>
age |Rober> => |age: 31>
age |Frank> => |age: 31>
age |Julie> => |age: 31>
----------------------------------------

Another common usage is:

op "" |list> => |bah>   -- all elements in |list> have the same definition of op.

(at least in the general case, and then over-write the specific cases)
Now, another example. The rule on the right hand side can be specific to that user.
Words fail me, so hopefully an example will help:

sa: |list> => |Maz> + |Liz> + |Sarah>    -- just define some list.
sa: op-self "" |list> => 19 |_self>      -- NB: the |_self> here. 
sa: dump                                 -- the parse_rule_line() swaps in the right meaning for each case.
 |list> => |Maz> + |Liz> + |Sarah>
op-self |Maz> => 19.000|Maz>             -- NB: |_self> is now |Maz>
op-self |Liz> => 19.000|Liz>             -- |_self> is now |Liz>
op-self |Sarah> => 19.000|Sarah>         -- |_self> is now |Sarah>

30/7/2014: OK. I have had a version of train of thought for a long time now, even going back to this.
Anyway, finally implemented a version we can use in the console:

sa: load early-us-presidents.sw        -- load a data set. The bigger the better the result.
sa: train-of-thought[30] |Jefferson>
context: sw console
one: |Jefferson>
n: 30
|X>: |Jefferson>

|year: 1805>                          
recall not found
0.000|>                                -- its bugging out because we ran into a dead end.
recall not found                       -- we don't know anything at all about |year: 1805>, so train of thought stops.
0.000|>
recall not found                       -- we also don't know anything about |>, so train of thought is dead.
0.000|>
recall not found

-- quick check:
sa: dump |year: 1805>
-- nope. nothing.

sa: create inverse                     -- this is the fix. And should prevent most dead ends, by pointing back to where we came from.
sa: dump |year: 1805>
inverse-president-era |year: 1805> => |Jefferson>  -- NB: if we run into |year: 1805> we can at least now go back to |Jefferson>

-- so now let's try again: 
sa: train-of-thought[30] |Jefferson>   -- 30 steps in our train. 

context: sw console
one: |Jefferson>
n: 30
|X>: |Jefferson>

|party: Democratic-Republican>
|Q Adams>
|party: Democratic-Republican>
|Jefferson>
|early US Presidents: _list>
|Q Adams>
|party: Democratic-Republican>
|year: 1791>
|party: Democratic-Republican>
|Madison>
|year: 1814>
|Madison>
|person: James Madison>
|US President: James Madison>
|person: James Madison>
|US President: James Madison>
|person: James Madison>
|Madison>
|early US Presidents: _list>
|Q Adams>
|number: 6>
|Q Adams>
|person: John Quincy Adams>
|Q Adams>
|number: 6>
|Q Adams>
|party: Democratic-Republican>
|year: 1825>
|party: Democratic-Republican>
|Madison>
6.000|party: Democratic-Republican> + 6.000|Q Adams> + |Jefferson> + 2.000|early US Presidents: _list> + |year: 1791> + 4.000|Madison> + |year: 1814> + 3.000|person: James Madison> + 2.000|US President: James Madison> + 2.000|number: 6> + |person: John Quincy Adams> + |year: 1825>

Yeah. The results here aren't super great, but this is only because the data set is no where near big enough to work well.
BTW, here is the code:

# where n is an int.
def console_train_of_thought(one,context,n):
  try:
    n = int(n)
  except:
    return ket("",0)

  print("context:",context.name)
  print("one:",one)
  print("n:",n)
  X = one.pick_elt()
  print("|X>:",X)
  print()
  result = superposition()

  for k in range(n):
    op = X.apply_op(context,"supported-ops").pick_elt()  #   |op> => pick-elt supported-ops |X>
    X = X.apply_op(context,op).pick_elt()                #   |X> => pick-elt apply(|op>,|X>)
    result.data.append(X)                   
    print(X.display())
  return result                             # return a record of the train-of-thought

Now, so what is the idea behind this?

Well, start with a seed superposition
Use pick-elt to randomly choose an element from that superposition
Then in a loop: 
  look up that kets supported-ops (ie operators that are relevant for that ket)
  randomly pick one of those operators
  apply that op to your ket
  randomly choose a new ket from that resulting ket/sp
repeat loop

BTW, here is another example of pick-elt:

----------------------------------------
|context> => |context: schrodingers cat>

is-alive |cat> => 0.500|yes> + 0.500|no>
alive? |*> #=> normalize pick-elt is-alive |_self>
----------------------------------------

sa: alive? |cat>
|yes>

sa: .                 -- dot in the console means repeat last computation
|no>                  -- heh. saves typing. In this case "alive? |cat>"

sa: .
|yes>

sa: .
|no>

sa: .
|no>

And some trivia: I seem to recall the original motivation for the supported-ops operator was so we could write a train-of-thought function.

10/8/2014: OK. Today a quick comparison of our different representations:

$ cat sw-examples/deli-closing-times.sw
|context> => |context: deli closing time>

|weekday: _list> => |day: monday> + |day: tuesday> + |day: wednesday> + |day: thursday> + |day: friday>
|weekend: _list> => |day: saturday> + |day: sunday>

deli-closing-time "" |weekday: _list> => |time: 6pm>
deli-closing-time "" |weekend: _list> => |time: 4:30pm>
deli-closing-time |public holiday> => |closed>

$ ./the_semantic_db_console.py
Welcome!

sa: load deli-closing-times.sw
sa: dump
----------------------------------------
|context> => |context: deli closing time>

 |weekday: _list> => |day: monday> + |day: tuesday> + |day: wednesday> + |day: thursday> + |day: friday>

 |weekend: _list> => |day: saturday> + |day: sunday>

deli-closing-time |day: monday> => |time: 6pm>

deli-closing-time |day: tuesday> => |time: 6pm>

deli-closing-time |day: wednesday> => |time: 6pm>

deli-closing-time |day: thursday> => |time: 6pm>

deli-closing-time |day: friday> => |time: 6pm>

deli-closing-time |day: saturday> => |time: 4:30pm>

deli-closing-time |day: sunday> => |time: 4:30pm>

deli-closing-time |public holiday> => |closed>
----------------------------------------

sa: display
  context: deli closing time

  weekday: _list
  supported-ops: op:
               : day: monday, day: tuesday, day: wednesday, day: thursday, day: friday

  weekend: _list
  supported-ops: op:
               : day: saturday, day: sunday

  day: monday
      supported-ops: op: deli-closing-time
  deli-closing-time: time: 6pm

  day: tuesday
      supported-ops: op: deli-closing-time
  deli-closing-time: time: 6pm

  day: wednesday
      supported-ops: op: deli-closing-time
  deli-closing-time: time: 6pm

  day: thursday
      supported-ops: op: deli-closing-time
  deli-closing-time: time: 6pm

  day: friday
      supported-ops: op: deli-closing-time
  deli-closing-time: time: 6pm

  day: saturday
      supported-ops: op: deli-closing-time
  deli-closing-time: time: 4:30pm

  day: sunday
      supported-ops: op: deli-closing-time
  deli-closing-time: time: 4:30pm

  public holiday
      supported-ops: op: deli-closing-time
  deli-closing-time: closed


sa: matrix[deli-closing-time]
[ closed       ] = [  0     0     0     0     0     0     0     1.00  ] [ day: friday    ]
[ time: 4:30pm ]   [  0     0     1.00  1.00  0     0     0     0     ] [ day: monday    ]
[ time: 6pm    ]   [  1.00  1.00  0     0     1.00  1.00  1.00  0     ] [ day: saturday  ]
                                                                        [ day: sunday    ]
                                                                        [ day: thursday  ]
                                                                        [ day: tuesday   ]
                                                                        [ day: wednesday ]
                                                                        [ public holiday ]

17/8/2014: OK. I don't think I have made use of op-self notation yet. So here is a small example.
In this case, first find Robin Williams best movies, and then a general version.
First, we need data for the imdb-rating-self operator:

-- learn that "movie: x" has an imdb rating of 8
imdb-rating-self |movie: x> => 8 |movie: x>         -- NB: since this is an op-self operator, the ket on the left and right are the same.
-- learn that "movie: y" has an imdb rating of 5.5  -- op-self only changes the coeff of the ket, not the label.
imdb-rating-self |movie: y> => 5.5 |movie: y>
-- then we need to do the same for all movies we have knowledge about.

Next, we need to know the movies of an actor:

movies |actor: v> => |move: a> + |movie: b> + |movie: c> + ....

-- eg, Robin Williams (fill in real data later!)
movies |actor: Robin Williams> => |move: alpha> + |movie: beta> + ....

Now, we have enough to find his best movies (in this case, anything with an imdb rating 7 or above):

best-movies |actor: Robin Williams> => coeff-sort drop-below[7] imdb-rating-self movies |actor: Robin Williams>

-- or, more generally:
best-movies |actor: *> #=> coeff-sort drop-below[7] imdb-rating-self movies |_self>

17/8/2014: Some pseudo-code to find words that share a similar context.
Roughly like this:

for x in word_list:
  for y in word_list:
    shared-context |words: x, y> => drop-below[t2] (drop-below[t1] find-topic[op] |x> + drop-below[t1] find-topic[op] |y>)

An example perhaps, is I am currently reading a book about Charles Babbage and the Difference Engine.
If the pseudo-code is correct, then it should find this association.

11/11/2014: Let's redo the plurals example:
(and how easy it is to do, suggests to me I am on the right track.)

-- before we have defined anything:
sa: plural |word: cat>
|>
-- in English |> means "I don't know anything about that".

-- define a general rule:
sa: plural |word: *> #=> merge-labels(|_self> + |s>)

-- test it:
sa: plural |word: cat>
|word: cats>

sa: plural |word: dog>
|word: dogs>

-- ok. But what about the irregular forms?
sa: plural |word: mouse>
|word: mouses>

sa: plural |word: foot>
|word: foots>

-- ok. we have a general rule, now just define a specific rule:
-- learn mouse specific rule
sa: plural |word: mouse> => |word: mice>

-- learn foot specific rule
sa: plural |word: foot> => |word: feet>

-- now, try again:
sa: plural |word: mouse>
|word: mice>

sa: plural |word: foot>
|word: feet>

And, let's check what this looks like in matrix form:
sa: matrix[plural]
[ word: *s   ] = [  1.00  0     0     ] [ word: *     ]
[ word: feet ]   [  0     1.00  0     ] [ word: foot  ]
[ word: mice ]   [  0     0     1.00  ] [ word: mouse ]

20/6/2014: Anyway, time for me to try and explain the basics of my model!
The founding idea is that for every concept there is a ket, and we give it some agreed upon label.
So, |hungry>, |tired>, |emotion: happy>, |plant: tree: pine>, |insect: bee> and so on.
(where ": " is used to separate categories from sub-categories. So a tree is a sub-category of a plant. A pine is a sub-category of a tree. A bee is a sub-category of an insect, and so on.)
(physically, if we define low-order as closer to input from the outside world, and high-order as more abstract, then parent categories are lower order than child categories. So a bee is a little higher order than an insect. A pine tree is higher order than a plant. Indeed, the general-to-specific code is meant to implement this idea.)

Then the degree that concept is felt, is the coefficient. So:
0 |hungry> means not at all hungry.
0.3|hungry> means slightly hungry.
20 |hungry> means very hungry.

Then concepts that are thought of simultaneously are represented by superpositions (name, and notation borrowed from Quantum Mechanics)
So, hungry, tired, but happy is represented by:
|hungry> + |tired> + |emotion: happy>
And, a little hungry, very tired, and very happy is represented by:
0.2|hungry> + 10|tired> + 10|emotion: happy> -- the coeffs doesn't have to be super exact

Concepts that are thought of in time sequence are represented using "sequences", though that is not (yet?) implemented in my code.
(though potentially they could be indirectly represented by carefully chosen superpositions)
eg: swinging between happy, sad, elated, tired would be:
|emotion: happy> . |emotion: sad> . |emotion: elated> . |tired>
where dots separate elements in a time sequence, in contrast with superpositions that use the plus sign.
Also, it can be either a time sequence of kets, or a time sequence of superpositions.
eg: |a> . |b> + |c> + |d> . |e> . |f> . |g> + |h> . |i> . |j>
Perhaps more clearly written as: |a> . (|b> + |c> + |d>) . |e> . |f> . (|g> + |h>) . |i> . |j>
And of course they can have coefficients:
3|u> . 19|v> . (2|m> + 3|n>) . 13 |x> . |y> . |z>

Next, we use (some of) the mathematical properties of bra-kets from QM.

1) <x||y> == 0 if x != y.  (NB: we deviate from QM a little here, since this is not always true in QM. eg: <p|x> = exp(ipx))
2) <x||y> == 1 if x == y.
3) <!x||y> == 1 if x != y. (NB: the ! acts as a not. cf, the -v switch for grep) 
4) <!x||y> == 0 if x == y.
5) <x: *||y: z> == 0 if x != y.
6) <x: *||y: z> == 1 if x == y, for any z. 
7) applying bra's is linear. <x|(|a> + |b> + |c>) == <x||a> + <x||b> + <x||c>
8) if a coeff is not given, then it is 1. eg, <x| == <x|1 and 1|x> == |x>
9) bra's and ket's commute with the coefficients. eg, <x|7 == 7 <x| and 13|x> == |x>13  
10) in contrast to QM, in BKO operators are right associative only.
<a|(op|b>) is valid and is identical to <a|op|b>
(<a|op)|b> is invalid, and undefined.
11) again, in contrast to QM, <a|op|b> != <b|op|a>^* (a consequence of (10) really)
12) applying projections is linear. |x><x|(|a> + |b> + |c>) == |x><x||a> + |x><x||b> + |x><x||c>
13) kets in superpositions commute. |a> + |b> == |b> + |a>
14) kets in sequences do not commute. |a> . |b> != |b> . |a>
Though maybe in the sequence version of simm, this would be useful:
|a> . |b> = c |b> . c |a>, where usually c is < 1. (yeah, it "bugs out" if you swap it back again, but in practice should be fine)
another example: 
  |c> . |a> . |b> = c |a> . c |c> . |b>
                  = c |a> . c |b> . c^2 |c>
15) operators (in general) do not commute. <b|op2 op1|a> != <b|op1 op2|a>
16) if a coeff in a superposition is zero, we can drop it from the superposition without changing the meaning of that superposition. 
17) we can arbitrarily add kets to a superposition if they have coeff zero without changing the meaning of that superposition.
18) |> is the identity element for superpositions. sp + |> == |> + sp == sp.
19) the + sign in superpositions is literal. ie, kets add.
|a> + |a> + |a> = 3|a>
|a> + |b> + |c> + 6|b> = |a> + 7|b> + |c>
20) <x|op-sequence|y> is always a scalar/float
21) |x><x|op-sequence|y> is always a ket or a superposition

Now, some examples:
Applying bra's to superpositions:

<hungry|(0.2|hungry> + 10|tired> + 10|emotion: happy>) 
  == <hungry|0.2|hungry> + <hungry|10|tired> + <hungry|10|emotion: happy> 
  == 0.2 + 0 + 0 
  == 0.2
<tired|(0.2|hungry> + 10|tired> + 10|emotion: happy>) 
  == <tired|0.2|hungry> + <tired|10|tired> + <tired|10|emotion: happy> 
  == 0 + 10 + 0 
  == 10

Applying a category-bra to a superposition (in this case the category of people):

<person: *|(2|fish> + 9|animal: cat> + 5|person: Fred> + 7 |animal: dog> + 2|person: Sam> + 13|building: house>) 
  == <person: *|2|fish> + <person: *|9|animal: cat> + <person: *|5|person: Fred> + <person: *|7 |animal: dog> + <person: *|2|person: Sam> + <person: *|13|building: house>
  == 0 + 0 + <person: *|5|person: Fred> + 0 + <person: *|2|person: Sam> + 0   
  == 5 + 2 
  == 7

And some examples of projections:

|_self><fish|(2|fish> + 9|animal: cat> + 5|person: Fred> + 7 |animal: dog> + 2|person: Sam> + 13|building: house>) 
  == 2|fish>

|_self><animal: *|(2|fish> + 9|animal: cat> + 5|person: Fred> + 7 |animal: dog> + 2|person: Sam> + 13|building: house>) 
  == 9|animal: cat> + 7|animal: dog>

|_self><person: *|(2|fish> + 9|animal: cat> + 5|person: Fred> + 7 |animal: dog> + 2|person: Sam> + 13|building: house>) 
  == 5|person: Fred> + 2|person: Sam>

|_self><building: *|(2|fish> + 9|animal: cat> + 5|person: Fred> + 7 |animal: dog> + 2|person: Sam> + 13|building: house>) 
  == 13|building: house>

|_self><person: Fred|(2|fish> + 9|animal: cat> + 5|person: Fred> + 7 |animal: dog> + 2|person: Sam> + 13|building: house>) 
  == 5|person: Fred>
  
And a couple using the negation feature:
|_self><!animal: *|(2|fish> + 9|animal: cat> + 5|person: Fred> + 7 |animal: dog> + 2|person: Sam> + 13|building: house>)
  == 2|fish> + 0 + 5|person: Fred> + 0 + 2|person: Sam> + 13|building: house>

|_self><!person: *|(2|fish> + 9|animal: cat> + 5|person: Fred> + 7 |animal: dog> + 2|person: Sam> + 13|building: house>) 
  == 2|fish> + 9|animal: cat> + 0 + 7 |animal: dog> + 0 + 13|building: house>

Physically in the brain (in this model):

1) brain-space is the 3D lattice representing the physical location of neurons in a brain
  (4D lattice if we count time, though the exact time-step depends on the integration window at each neuron)
2) each ket corresponds to a neuron.
3) applying a bra corresponds to measuring the value of that neuron (since it is spiking, averaged over some time period, presumably).
4) some operators step/propagate through brain-space
  eg: op |x> => |a> + |b>
5) some operators "measure" a value (this is common in QM)
  eg: op-self |x> => n |x>, where n is a scalar/float.
  where the convention in this case is to label the operator op-self.

Some notes:

a) we can map a superposition to the lattice representation by adding in kets with 0 coeff for all kets not mentioned in the superposition, but are in the lattice.
b) a couple of examples of (4) are:
population |location: Adelaide> => |population: 1200000>
age |person: Fred> => |age: 26>
c) examples of (5) (and representing the same information) are:
population-self |location: Adeliade> => 1200000 |location: Adelaide>
age-self |person: Fred> => 26 |person: Fred>
Depending on what you are doing, you choose the form you want.

The big claim is that we can represent all relatively static information using superpositions, and non-static using sequences.
So if we can develop machinery to manipulate superpositions, we are well on the way to solving AI.
AI becomes the task of finding an appropriate set of functions mapping superpositions to superpositions.
So we build up a big collection of functions (currenty in python, and mostly in this file), and then add new ones as new needs arrive.

The next piece is what I call "molecules of knowledge", or learn rules.
eg:

friends |Fred> => |Sam> + |Harry>
age |Fred> => |age: 21>
parents |Harry> => |Liz> + |Richard>

in python, these correspond to:

context.learn("friends","Fred",ket("Sam") + ket("Harry"))
context.learn("age","Fred","age: 21")
context.learn("parents","Harry",ket("Liz") + ket("Richard"))

Now, if we load them up in the console:

sa: friends |Fred> => |Sam> + |Harry>
sa: age |Fred> => |age: 21>
sa: parents |Harry> => |Liz> + |Richard>
sa: dump
----------------------------------------
|context> => |context: sw console>

friends |Fred> => |Sam> + |Harry>
age |Fred> => |age: 21>

parents |Harry> => |Liz> + |Richard>
----------------------------------------

eg:

-- age of Fred:
sa: age |Fred>              -- query the age of Fred
|age: 21>

sa: age |Liz> => |age: 42>  -- learn the age of Harry's mum
sa: age |Liz>               -- query the age of Harry's mum
|age: 42>

In python the queries are:

context.recall("age",ket("Fred"))
context.recall("age",ket("Liz"))
-- or:
ket("Fred").apply_op(context,"age")
ket("Liz").apply_op(context,"age")

BTW, "molecules of knowledge" or learn rules, I guess, could be considerd labelled pointers mapping a ket to a superposition.
So, in the case of Fred just above, one pointer has the label "friends", and the other has "age".
Though they can also be considered labelled matrices mapping kets to vectors.

BTW, the null label is ""
eg if we do: |tmp> => |some> + |result>
then |tmp> has one pointer, labelled "".
eg, in the console:

sa: |tmp> => |some> + |result>
sa: "" |tmp>                    -- NB: |tmp> and "" |tmp> are distinct objects.
|some> + |result>

Now, a summary of files of interest:
Rules of the game, a minimalist description of various ket/sp functions.
An early, very rough, attempt at finding the permutations we need to parse for the BKO language (Bra-Ket-Operator).
A rough, incomplete, attempt at more formally defining a grammar for BKO.
Some python to check the progress on my parser (the parser is still broken and incomplete, BTW!)
A shell script wrapper for the check-the-parser python.
The code with the ket + sp + context + context-list classes, and a couple of other odd bits.
A collection of ket/sp to ket/sp functions. The idea is to add more of these with time, as you build new features into your AI.
The processor for the BKO langauge, plus a bunch of hash tables, where we "wire in" new functions. (unfortunately it is a bit of a black art which table new functiongs belong in, but the verbose output in the console using related functions can give hints)
The console, where we interact with the semantic agent. Note that currently the output is very verbose to help with debugging. (eventually we need a GUI version of the console)
A collection of sw example files. Some interesting, some are junk just used for testing!
Code that converts the sw directory into a html version of the "files" command in the console.
Bunch of stuff.
Mapping simple BKO into a simple neural network model.
BTW, the code quality is terrible, and needs a complete rewrite!
For example, the back-end of the superposition class should probably be rewritten from a list of kets to an ordered dictionary of kets (since the current version has painful big-O in some cases. eg creating frequency lists, and hence the dict_to_sp() code).
And we need some code to handle iteration through superpositions (currently I use: for x in sp.data:)
The first email where I propose the BKO scheme
A record of some playing in the console.

So, the claims for this project are:
1) knowledge representation using the BKO scheme, as used in the sw file format
2) reasoning with that knowledge using the BKO language (possibly called Feynman), as used in the semantic agent console
3) some mathematical foundations for symbolic AI

A brief summary of the semantic db project:
1) a general scheme to represent knowledge (using some notation borrowed, and tweaked, from quantum mechanics)
2) a programming language to reason with that knowledge (the BKO language).
3) a console where you can use the BKO language.

So, the main idea is to converge all knowledge into (what I call) superpositions.
And then write relatively general functions that can work on superpositions, mapping superpositions to superpositions.

There are lots of pieces in the full project, but some key pieces are:
1) similar -- compares superpositions and returns a result in [0,1] depending on how similar (this is a basic version of pattern recognition)
2) find-topic -- looks up frequency lists and returns degree of membership
3) categorize-code -- with respect to some metric it puts items into categories
(based on my maths idea of bridging sets)
4) the maths idea of "function matrices"
5) active-buffer -- first (very early) step in the direction of a computer understanding what it is reading.

created: 29/11/2013
updated: 16/11/2014
by Garry Morrison
email: garry -at- semantic-db.org