The Semantic DB Project write up.

Knowledge Representation and a domain specific language to reason with that knowledge

AKA

Towards a Semantic Web

Let's start with a series of examples of the BKO (bra-ket-op) scheme:

a flow chart from the original Tim Berners-Lee html proposal:

Now in BKO form:

|context> => |context: www proposal>

describes |document: www proposal> => |"Hypertext"> + |A Proposal "Mesh">
refers-to |document: www proposal> => |Comms ACM>

describes |Comms ACM> => |"Hypertext"> 

includes |"Hypertext"> => |Linked information> + |Hypermedia>

for-example |Linked information> => |Hyper Card> + |ENQUIRE> + |A Proposal "Mesh">

describes |a proposal "mesh"> => |CERN>
unifies |a proposal "mesh"> => |ENQUIRE> + |VAX/NOTES> + |uucp News> + |CERNDOC>

examples |Computer conferencing> => |IBM GroupTalk> + |uucp News> + |VAX/NOTES> + |A Proposal "Mesh">

for-example |Hierarchical systems> => |CERN> + |CERNDOC> + |Vax/Notes> + |uucp News> + |IBM GroupTalk>

includes |CERNDOC> => |document: www proposal>

wrote |person: Tim Berners-Lee> => |document: www proposal>

a representation of the methanol molecule:

in BKO form:

methanol.sw

|context> => |context: methanol>

molecular-pieces |molecule: methanol> => |methanol: 1> + |methanol: 2> + |methanol: 3> + |methanol: 4> + |methanol: 5> + |methanol: 6>

atom-type |methanol: 1> => |atom: H>
bonds-to |methanol: 1> => |methanol: 4>

atom-type |methanol: 2> => |atom: H>
bonds-to |methanol: 2> => |methanol: 4>

atom-type |methanol: 3> => |atom: H>
bonds-to |methanol: 3> => |methanol: 4>

atom-type |methanol: 4> => |atom: C>
bonds-to |methanol: 4> => |methanol: 1> + |methanol: 2> + |methanol: 3> + |methanol: 5>

atom-type |methanol: 5> => |atom: O>
bonds-to |methanol: 5> => |methanol: 4> + |methanol: 6>

atom-type |methanol: 6> => |atom: H>
bonds-to |methanol: 6> => |methanol: 5>

a simple network:

in BKO:

simple-network.sw

O |a1> => |a2>
O |a2> => |a3>
O |a3> => |a4>
O |a4> => |a5>
O |a5> => |a6>
O |a6> => |a7>
O |a7> => |a8>
O |a8> => |a9>
O |a9> => |a10>
O |a10> => |a1> + |b1>

O |b1> => |b2>
O |b2> => |b3>
O |b3> => |b4>
O |b4> => |b5>
O |b5> => |b6>
O |b6> => |b7>
O |b7> => |b1>

as a matrix:

sa: matrix[O]
[ a1  ] = [  0     0     0     0     0     0     0     0     0     1.00  0     0     0     0     0     0     0     ] [ a1  ]
[ a2  ]   [  1.00  0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     ] [ a2  ]
[ a3  ]   [  0     1.00  0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     ] [ a3  ]
[ a4  ]   [  0     0     1.00  0     0     0     0     0     0     0     0     0     0     0     0     0     0     ] [ a4  ]
[ a5  ]   [  0     0     0     1.00  0     0     0     0     0     0     0     0     0     0     0     0     0     ] [ a5  ]
[ a6  ]   [  0     0     0     0     1.00  0     0     0     0     0     0     0     0     0     0     0     0     ] [ a6  ]
[ a7  ]   [  0     0     0     0     0     1.00  0     0     0     0     0     0     0     0     0     0     0     ] [ a7  ]
[ a8  ]   [  0     0     0     0     0     0     1.00  0     0     0     0     0     0     0     0     0     0     ] [ a8  ]
[ a9  ]   [  0     0     0     0     0     0     0     1.00  0     0     0     0     0     0     0     0     0     ] [ a9  ]
[ a10 ]   [  0     0     0     0     0     0     0     0     1.00  0     0     0     0     0     0     0     0     ] [ a10 ]
[ b1  ]   [  0     0     0     0     0     0     0     0     0     1.00  0     0     0     0     0     0     1.00  ] [ b1  ]
[ b2  ]   [  0     0     0     0     0     0     0     0     0     0     1.00  0     0     0     0     0     0     ] [ b2  ]
[ b3  ]   [  0     0     0     0     0     0     0     0     0     0     0     1.00  0     0     0     0     0     ] [ b3  ]
[ b4  ]   [  0     0     0     0     0     0     0     0     0     0     0     0     1.00  0     0     0     0     ] [ b4  ]
[ b5  ]   [  0     0     0     0     0     0     0     0     0     0     0     0     0     1.00  0     0     0     ] [ b5  ]
[ b6  ]   [  0     0     0     0     0     0     0     0     0     0     0     0     0     0     1.00  0     0     ] [ b6  ]
[ b7  ]   [  0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     1.00  0     ] [ b7  ]

a binary tree:

in BKO:

binary-tree-with-child.sw

text |x> => |start node>
left |x> => |0>
right |x> => |1>
child |x> => |0> + |1>

text |0> => |first child node>
left |0> => |00>
right |0> => |10>
child |0> => |00> + |10>

text |1> => |second child node>
left |1> => |01>
right |1> => |11>
child |1> => |01> + |11>

text |00> => |third child node>
left |00> => |000>
right |00> => |100>
child |00> => |000> + |100>

text |10> => |fourth child node>
left |10> => |010>
right |10> => |110>
child |10> => |010> + |110>

text |01> => |fifth child node>
left |01> => |001>
right |01> => |101>
child |01> => |001> + |101>

text |11> => |sixth child node>
left |11> => |011>
right |11> => |111>
child |11> => |011> + |111>

as a matrix:

sa: matrix[child]
[ 0   ] = [  0  0     0     0     0     0     0     1.00  ] [ *  ]
[ 00  ]   [  0  1.00  0     0     0     0     0     0     ] [ 0  ]
[ 000 ]   [  0  0     1.00  0     0     0     0     0     ] [ 00 ]
[ 001 ]   [  0  0     0     1.00  0     0     0     0     ] [ 01 ]
[ 01  ]   [  0  0     0     0     1.00  0     0     0     ] [ 1  ]
[ 010 ]   [  0  0     0     0     0     1.00  0     0     ] [ 10 ]
[ 011 ]   [  0  0     0     0     0     0     1.00  0     ] [ 11 ]
[ 1   ]   [  0  0     0     0     0     0     0     1.00  ] [ x  ]
[ 10  ]   [  0  1.00  0     0     0     0     0     0     ]
[ 100 ]   [  0  0     1.00  0     0     0     0     0     ]
[ 101 ]   [  0  0     0     1.00  0     0     0     0     ]
[ 11  ]   [  0  0     0     0     1.00  0     0     0     ]
[ 110 ]   [  0  0     0     0     0     1.00  0     0     ]
[ 111 ]   [  0  0     0     0     0     0     1.00  0     ]

learn a grid:

c = new_context("grid play")

def ket_elt(j,i):
  return ket("grid: " + str(j) + " " + str(i))

# Makes use of the fact that context.learn() ignores rules that are the empty ket |>.
def ket_elt_bd(j,i,I,J):
# finite universe model:
#  if i <= 0 or j <= 0 or i > I or j > J:
#    return ket("",0)
# torus model:
  i = (i - 1)%I + 1
  j = (j - 1)%J + 1
  return ket("grid: " + str(j) + " " + str(i))


def create_grid(c,I,J):
  c.learn("dim-1","grid",str(I))
  c.learn("dim-2","grid",str(J))

  for j in range(1,J+1):
    for i in range(1,I+1):
      elt = ket_elt(j,i)
      c.add_learn("elements","grid",elt)
      c.learn("N",elt,ket_elt_bd(j-1,i,I,J))
      c.learn("NE",elt,ket_elt_bd(j-1,i+1,I,J))
      c.learn("E",elt,ket_elt_bd(j,i+1,I,J))
      c.learn("SE",elt,ket_elt_bd(j+1,i+1,I,J))
      c.learn("S",elt,ket_elt_bd(j+1,i,I,J))
      c.learn("SW",elt,ket_elt_bd(j+1,i-1,I,J))
      c.learn("W",elt,ket_elt_bd(j,i-1,I,J))
      c.learn("NW",elt,ket_elt_bd(j-1,i-1,I,J))

a couple of example grid locations:

supported-ops |grid: 4 39> => |op: N> + |op: NE> + |op: E> + |op: SE> + |op: S> + |op: SW> + |op: W> + |op: NW>
N |grid: 4 39> => |grid: 3 39>
NE |grid: 4 39> => |grid: 3 40>
E |grid: 4 39> => |grid: 4 40>
SE |grid: 4 39> => |grid: 5 40>
S |grid: 4 39> => |grid: 5 39>
SW |grid: 4 39> => |grid: 5 38>
W |grid: 4 39> => |grid: 4 38>
NW |grid: 4 39> => |grid: 3 38>

supported-ops |grid: 4 40> => |op: N> + |op: NE> + |op: E> + |op: SE> + |op: S> + |op: SW> + |op: W> + |op: NW>
N |grid: 4 40> => |grid: 3 40>
NE |grid: 4 40> => |grid: 3 41>
E |grid: 4 40> => |grid: 4 41>
SE |grid: 4 40> => |grid: 5 41>
S |grid: 4 40> => |grid: 5 40>
SW |grid: 4 40> => |grid: 5 39>
W |grid: 4 40> => |grid: 4 39>
NW |grid: 4 40> => |grid: 3 39>

Fibonacci in BKO:

fibonacci.sw

|context> => |context: Fibonacci>

fib |0> => |0>
fib |1> => |1>

n-1 |*> #=> arithmetic(|_self>,|->,|1>)
n-2 |*> #=> arithmetic(|_self>,|->,|2>)
fib |*> #=> arithmetic( fib n-1 |_self>, |+>, fib n-2 |_self>)
fib-ratio |*> #=> arithmetic( fib |_self> , |/>, fib n-1 |_self> )

Fizz-Buzz example:

The task:
For numbers 1 through 100,

- if the number is divisible by 3 print Fizz;
- if the number is divisible by 5 print Buzz;
- if the number is divisible by 3 and 5 (15) print FizzBuzz;
- else, print the number.

in BKO:

fizz-buzz.sw

|context> => |context: Fizz-Buzz v1>
|list> => range(|1>,|100>)
fizz-buzz-0 |*> #=> |_self>
fizz-buzz-1 |*> #=> if(arithmetic(|_self>,|%>,|3>) == |0>,|Fizz>,|>)
fizz-buzz-2 |*> #=> if(arithmetic(|_self>,|%>,|5>) == |0>,|Buzz>,|>)
fizz-buzz-3 |*> #=> if(arithmetic(|_self>,|%>,|15>) == |0>,|FizzBuzz>,|>)

map[fizz-buzz-0,fizz-buzz] "" |list>
map[fizz-buzz-1,fizz-buzz] "" |list>
map[fizz-buzz-2,fizz-buzz] "" |list>
map[fizz-buzz-3,fizz-buzz] "" |list>


|context> => |context: Fizz-Buzz v2>
|list> => range(|1>,|100>)
is-zero |*> => |False>
is-zero |0> => |True>
fizz-buzz-0 |*> #=> |_self>
fizz-buzz-1 |*> #=> if(is-zero arithmetic(|_self>,|%>,|3>),|Fizz>,|>)
fizz-buzz-2 |*> #=> if(is-zero arithmetic(|_self>,|%>,|5>),|Buzz>,|>)
fizz-buzz-3 |*> #=> if(is-zero arithmetic(|_self>,|%>,|15>),|FizzBuzz>,|>)

map[fizz-buzz-0,fizz-buzz] "" |list>
map[fizz-buzz-1,fizz-buzz] "" |list>
map[fizz-buzz-2,fizz-buzz] "" |list>
map[fizz-buzz-3,fizz-buzz] "" |list>


|context> => |context: Fizz-Buzz v3>
-- define our list
|list> => range(|1>,|100>)

-- define is-zero function
is-zero |*> => |False>
is-zero |0> => |True>

-- define is-mod functions
is-mod-3 |*> #=> is-zero arithmetic(|_self>,|%>,|3>)
is-mod-5 |*> #=> is-zero arithmetic(|_self>,|%>,|5>)
is-mod-15 |*> #=> is-zero arithmetic(|_self>,|%>,|15>)

-- apply them
map[is-mod-3] "" |list>
map[is-mod-5] "" |list>
map[is-mod-15] "" |list>

-- define fizz-buzz functions
fizz-buzz-0 |*> #=> |_self>
fizz-buzz-1 |*> #=> if(is-mod-3 |_self>,|Fizz>,|>)
fizz-buzz-2 |*> #=> if(is-mod-5 |_self>,|Buzz>,|>)
fizz-buzz-3 |*> #=> if(is-mod-15 |_self>,|FizzBuzz>,|>)

-- apply them
map[fizz-buzz-0,fizz-buzz] "" |list>
map[fizz-buzz-1,fizz-buzz] "" |list>
map[fizz-buzz-2,fizz-buzz] "" |list>
map[fizz-buzz-3,fizz-buzz] "" |list>

A sample of the result:

is-mod-3 |1> => |False>
is-mod-5 |1> => |False>
is-mod-15 |1> => |False>
fizz-buzz |1> => |1>

is-mod-3 |2> => |False>
is-mod-5 |2> => |False>
is-mod-15 |2> => |False>
fizz-buzz |2> => |2>

is-mod-3 |3> => |True>
is-mod-5 |3> => |False>
is-mod-15 |3> => |False>
fizz-buzz |3> => |Fizz>

is-mod-3 |4> => |False>
is-mod-5 |4> => |False>
is-mod-15 |4> => |False>
fizz-buzz |4> => |4>

is-mod-3 |5> => |False>
is-mod-5 |5> => |True>
is-mod-15 |5> => |False>
fizz-buzz |5> => |Buzz>

is-mod-3 |6> => |True>
is-mod-5 |6> => |False>
is-mod-15 |6> => |False>
fizz-buzz |6> => |Fizz>

is-mod-3 |7> => |False>
is-mod-5 |7> => |False>
is-mod-15 |7> => |False>
fizz-buzz |7> => |7>

is-mod-3 |8> => |False>
is-mod-5 |8> => |False>
is-mod-15 |8> => |False>
fizz-buzz |8> => |8>

is-mod-3 |9> => |True>
is-mod-5 |9> => |False>
is-mod-15 |9> => |False>
fizz-buzz |9> => |Fizz>

is-mod-3 |10> => |False>
is-mod-5 |10> => |True>
is-mod-15 |10> => |False>
fizz-buzz |10> => |Buzz>

is-mod-3 |11> => |False>
is-mod-5 |11> => |False>
is-mod-15 |11> => |False>
fizz-buzz |11> => |11>

is-mod-3 |12> => |True>
is-mod-5 |12> => |False>
is-mod-15 |12> => |False>
fizz-buzz |12> => |Fizz>

is-mod-3 |13> => |False>
is-mod-5 |13> => |False>
is-mod-15 |13> => |False>
fizz-buzz |13> => |13>

is-mod-3 |14> => |False>
is-mod-5 |14> => |False>
is-mod-15 |14> => |False>
fizz-buzz |14> => |14>

is-mod-3 |15> => |True>
is-mod-5 |15> => |True>
is-mod-15 |15> => |True>
fizz-buzz |15> => |FizzBuzz>

is-mod-3 |16> => |False>
is-mod-5 |16> => |False>
is-mod-15 |16> => |False>
fizz-buzz |16> => |16>
...

a fictional George in BKO:

george.sw

|context> => |context: George>
source |context: George> => |sw-url: http://semantic-db.org/sw-examples/new-george.sw>

-- George is just some fictional character

|person: George> => |word: george>
age |person: George> => |age: 29>
dob |person: George> => |date: 1984-05-23>
hair-colour |person: George> => |hair-colour: brown>
eye-colour |person: George> => |eye-colour: blue>
gender |person: George> => |gender: male>
height |person: George> => |height: cm: 176>
wife |person: George> => |person: Beth>
occupation |person: George> => |occupation: car salesman>
friends |person: George> => |person: Fred> + |person: Jane> + |person: Liz> + |person: Andrew>
mother |person: George> => |person: Sarah>
father |person: George> => |person: David>
sisters |person: George> => |person: Emily>
brothers |person: George> => |person: Frank> + |person: Tim> + |person: Sam>

-- some general rules that apply to all people:
siblings |person: *> #=> brothers |_self> + sisters |_self>
children |person: *> #=> sons |_self> + daughters |_self>
parents |person: *> #=> mother |_self> + father |_self>
uncles |person: *> #=> brothers parents |_self>
aunts |person: *> #=> sisters parents |_self>
aunts-and-uncles |person: *> #=> siblings parents |_self>
cousins |person: *> #=> children siblings parents |_self>
grand-fathers |person: *> #=> father parents |_self>
grand-mothers |person: *> #=> mother parents |_self>
grand-parents |person: *> #=> parents parents |_self>
grand-children |person: *> #=> children children |_self>
great-grand-parents |person: *> #=> parents parents parents |_self>
great-grand-children |person: *> #=> children children children |_self>
immediate-family |person: *> #=> siblings |_self> + parents |_self> + children |_self>
friends-and-family |person: *> #=> friends |_self> + family |_self>

email |person: George> => |email: george.douglas@gmail.com>
education |person: George> => |education: high-school>
can-swim-self |person: George> => 0.7 |person: George>   -- an OK swimmer, but not a great one.
can-swim |person: George> => 0.7 |yes>

-- George's father is dead:
is-dead-self |person: David Douglas> => |person: David Douglas>
is-dead |person: David Douglas> => |yes>
-- and so on.

simple finite sets:

A = {a1,a2,a3,ab,ac,abc}
B = {b1,b2,b3,ab,bc,abc}
C = {c1,c2,c3,ac,bc,abc}

in BKO:

simple-finite-sets.sw

-- NB: coeffs for set elements are almost always in {0,1}
|A> => |a1> + |a2> + |a3> + |ab> + |ac> + |abc>     
|B> => |b1> + |b2> + |b3> + |ab> + |bc> + |abc>
|C> => |c1> + |c2> + |c3> + |ac> + |bc> + |abc>

now take a look at set intersection in the console:

sa: intersection(""|A>, ""|B>)
|ab> + |abc>

sa: intersection(""|A>, ""|C>)
|ac> + |abc>

sa: intersection(""|B>, ""|C>)
|bc> + |abc>

sa: intersection(""|A>,""|B>,""|C>)
|abc>

Now observe a correspondence between set intersection and addition of superpositions with coeffs in {0,1}:

-- add our superpositions:
-- NB: the coeffs correspond to the number of sets an element is in, cf. the Venn diagram above.
sa: ""|A> + ""|B> + ""|C>
|a1> + |a2> + |a3> + 2.000|ab> + 2.000|ac> + 3.000|abc> + |b1> + |b2> + |b3> + 2.000|bc> + |c1> + |c2> + |c3>

-- this is the same as union:
sa: clean(""|A> + ""|B> + ""|C>)
|a1> + |a2> + |a3> + |ab> + |ac> + |abc> + |b1> + |b2> + |b3> + |bc> + |c1> + |c2> + |c3>

-- this is the same as intersection:
sa: clean drop-below[3] (""|A> + ""|B> + ""|C>)
|abc>

-- this is a "soft" intersection:
sa: clean drop-below[2] (""|A> + ""|B> + ""|C>)
|ab> + |ac> + |abc> + |bc>

BTW, we also have set-builder (at least in theory, it is not yet implemented).
It looks something like this:

|answer> => yet-another-op another-op |x> in op |object> such that <y|op-sequence|x> >= 0.7

Movies say this is what the matrix looks like:

Perhaps, this is what the matrix really looks like:

First claim: all finite relatively static information can be represented in the BKO scheme

BKO is what I call an "active representation" -- once in this format, a lot of things are relatively easy.
One of the driving force behind BKO is to collapse all knowledge into one unified representation.
So any machinery we develop for some branch of knowledge often extends easily to other branches.

The core of the project are these two functions:

context.learn(a,b,c)
context.recall(a,b)

eg:
-- learn a rule
sa: a |b> => |c>

-- recall a rule
sa: a |b>
|c>

Here are the math rules for the BKO scheme:

1) <x||y> == 0 if x != y.  
2) <x||y> == 1 if x == y.
3) <!x||y> == 1 if x != y. (NB: the ! acts as a not. cf, the -v switch for grep) 
4) <!x||y> == 0 if x == y.
5) <x: *||y: z> == 0 if x != y.
6) <x: *||y: z> == 1 if x == y, for any z. 
7) applying bra's is linear. <x|(|a> + |b> + |c>) == <x||a> + <x||b> + <x||c>
8) if a coeff is not given, then it is 1. eg, <x| == <x|1 and 1|x> == |x>
9) bra's and ket's commute with the coefficients. eg, <x|7 == 7 <x| and 13|x> == |x>13  
10) in contrast to QM, in BKO operators are right associative only.
<a|(op|b>) is valid and is identical to <a|op|b>
(<a|op)|b> is invalid, and undefined.
11) again, in contrast to QM, <a|op|b> != <b|op|a>^* (a consequence of (10) really)
12) applying projections is linear. |x><x|(|a> + |b> + |c>) == |x><x||a> + |x><x||b> + |x><x||c>
13) kets in superpositions commute. |a> + |b> == |b> + |a>
14) kets in sequences do not commute. |a> . |b> != |b> . |a>
Though maybe in the sequence version of simm, this would be useful:
|a> . |b> = c |b> . c |a>, where usually c is < 1. (yeah, it "bugs out" if you swap it back again, but in practice should be fine)
another example: 
  |c> . |a> . |b> = c |a> . c |c> . |b>
                  = c |a> . c |b> . c^2 |c>
15) operators (in general) do not commute. <b|op2 op1|a> != <b|op1 op2|a>
16) if a coeff in a superposition is zero, we can drop it from the superposition without changing the meaning of that superposition. 
17) we can arbitrarily add kets to a superposition if they have coeff zero without changing the meaning of that superposition.
18) |> is the identity element for superpositions. sp + |> == |> + sp == sp.
19) the + sign in superpositions is literal. ie, kets add.
|a> + |a> + |a> = 3|a>
|a> + |b> + |c> + 6|b> = |a> + 7|b> + |c>
20) <x|op-sequence|y> is always a scalar/float
21) |x><x|op-sequence|y> is always a ket or a superposition

OK. So we have a knowledge representation scheme, now what?

Answer: we have a domain specific language to reason with it (again, just called BKO, or occasionally the Feynman language)
Some of the more interesting components are:

simm: the similarity metric

I'll skip the (simple) derivation, and just give the list version:

-- define a type of multiplication:
a*b = \Sum_k abs(a_k . b_k)     -- discrete version, where . is the standard multiplication operator.

-- define our metric:
simm(w,f,g) = (w*f + w*g - w*[f - g])/2.max(w*f,w*g)

NB: if w is not given, ie, simm(f,g), then assume w_k = 1 for all k.

Here is the python for this:

# w,f,g are lists of ints or floats.
def list_simm(w,f,g):
  the_len = min(len(f),len(g))
  w += [0] * (the_len - len(w))     
  f = f[:the_len]
  g = g[:the_len]

  wf = sum(abs(w[k]*f[k]) for k in range(the_len))
  wg = sum(abs(w[k]*g[k]) for k in range(the_len))
  wfg = sum(abs(w[k]*f[k] - w[k]*g[k]) for k in range(the_len))
  
  if wf == 0 and wg == 0:
    result = 0
  else:
    result = (wf + wg - wfg)/(2*max(wf,wg))
  return result

Now some examples:

graphs!

...

Now a couple of observations.
1) simm(w,f,g) gives the best results when f and g are "rescaled" so that w*f == w*g (excluding the case where they are length 1 vectors).
2) if f_k and g_k are >= 0 for all terms, then the equation can be compressed, using a + b - abs(a - b) = 2*min(a,b):

simm(w,f,g) = \Sum_k w_k min(f_k , g_k) / max(w*f,w*g)

And then these motivate the superposition/BKO versions of simm:

def weighted_simm(w,A,B):
  A = multiply(w,A)
  B = multiply(w,B)
  return intersection(A.normalize(),B.normalize()).count_sum()
  
def simm(A,B):
  return intersection(A.normalize(),B.normalize()).count_sum()

where:

A.normalize(), B.normalize() implement the idea of rescaling so that w*f == w*g, and also max(w*f,w*g) = 1
intersection(...) is equivalent to the min(f,g) term.

A BKO example:

simple-simm-test.sw

-- define the target pattern
|g> => 3|a> + 5|b> + 13|c>

-- define some similar patterns
|f1> => 0|a> + 5|b> + 13|c>
|f2> => 3|a> + 0|b> + 13|c>
|f3> => 3|a> + 5|b> + 0|c>

-- find how similar they actually are
|r> => 100 ket-simm(""|g>,""|g>)
|r1> => 100 ket-simm(""|f1>,""|g>)
|r2> => 100 ket-simm(""|f2>,""|g>)
|r3> => 100 ket-simm(""|f3>,""|g>)

the results:

 |g> => 3.000|a> + 5.000|b> + 13.000|c>
 |f1> => 0.000|a> + 5.000|b> + 13.000|c>
 |f2> => 3.000|a> + 0.000|b> + 13.000|c>
 |f3> => 3.000|a> + 5.000|b> + 0.000|c>

 |r> => 100.000|simm>
 |r1> => 85.714|simm>
 |r2> => 76.190|simm>
 |r3> => 38.095|simm>

Using simm to find similarity of superpositions:

Let's just give some examples:

sa: load simple-shopping-basket.sw
sa: dump
----------------------------------------
|context> => |context: shopping basket>

 |f> => 3.000|apple> + 5.000|oranges> + |milk> + |bread> + |coffee> + |steak>
basket |f> => 3.000|apple> + 5.000|oranges> + |milk> + |bread> + |coffee> + |steak>

basket |user 1> => |milk> + |bread> + |tea> + |bananas> + |carrots> + |chocolate>

basket |user 2> => 4.000|apple> + |milk> + |coffee> + |steak>

basket |user 3> => |chocolate> + |vegemite> + |olive oil> + |pizza> + |cheese>

basket |user 4> => |vegemite> + |cheese> + |bread> + |salami>
----------------------------------------

-- which user has the same shopping basket as f?
sa: 100 similar[basket] |f>
50.000|user 2> + 16.667|user 1> + 8.333|user 4>


sa: load disease-symptoms-example.sw
sa: dump
----------------------------------------
|context> => |context: disease>

disease-symptoms |disease: 1> => |symptom: a> + |symptom: b> + |symptom: c>

disease-symptoms |disease: 2> => |symptom: d> + |symptom: e>

disease-symptoms |disease: 3> => |symptom: f> + |symptom: g> + |symptom: h> + |symptom: i>

disease-symptoms |disease: 4> => |symptom: j> + |symptom: k> + |symptom: l>

disease-symptoms |patient: 1> => |symptom: b> + |symptom: h> + |symptom: f> + |symptom: k>
----------------------------------------

-- which disease is similar to that of patient 1?
sa: 100 similar[disease-symptoms] |patient: 1>
50.000|disease: 3> + 25.000|disease: 1> + 25.000|disease: 4>

A bigger similarity example -- simple pattern recognition of characters:

The data:

$ cat H-I-pat-rec.sw
|context> => |context: H I pat rec>

#   #
#   #
#   #
#####
#   #
#   #
#   #

pixels |letter: H> => |pixel: 1: 1> + |pixel: 1: 5>
pixels |letter: H> +=> |pixel: 2: 1> + |pixel: 2: 5>
pixels |letter: H> +=> |pixel: 3: 1> + |pixel: 3: 5>
pixels |letter: H> +=> |pixel: 4: 1> + |pixel: 4: 2> + |pixel: 4: 3> + |pixel: 4: 4> + |pixel: 4: 5>
pixels |letter: H> +=> |pixel: 5: 1> + |pixel: 5: 5>
pixels |letter: H> +=> |pixel: 6: 1> + |pixel: 6: 5>
pixels |letter: H> +=> |pixel: 7: 1> + |pixel: 7: 5>
dim-1 |letter: H> => |dimension: 5>
dim-2 |letter: H> => |dimension: 7>


    #
#   #
#   #
### #
#
#   #
#   #

pixels |noisy: H> => |pixel: 1: 5>
pixels |noisy: H> +=> |pixel: 2: 1> + |pixel: 2: 5>
pixels |noisy: H> +=> |pixel: 3: 1> + |pixel: 3: 5>
pixels |noisy: H> +=> |pixel: 4: 1> + |pixel: 4: 2> + |pixel: 4: 3> + |pixel: 4: 5>
pixels |noisy: H> +=> |pixel: 5: 1>
pixels |noisy: H> +=> |pixel: 6: 1> + |pixel: 6: 5>
pixels |noisy: H> +=> |pixel: 7: 1> + |pixel: 7: 5>
dim-1 |noisy: H> => |dimension: 5>
dim-2 |noisy: H> => |dimension: 7>


#   #
#
# ###
#####
##  #
#   #
### #

pixels |noisy: H2> => |pixel: 1: 1> + |pixel: 1: 5>
pixels |noisy: H2> +=> |pixel: 2: 1>
pixels |noisy: H2> +=> |pixel: 3: 1> + |pixel: 3: 3> + |pixel: 3: 4> + |pixel: 3: 5>
pixels |noisy: H2> +=> |pixel: 4: 1> + |pixel: 4: 2> + |pixel: 4: 3> + |pixel: 4: 4> + |pixel: 4: 5>
pixels |noisy: H2> +=> |pixel: 5: 1> + |pixel: 5: 2> + |pixel: 5: 5>
pixels |noisy: H2> +=> |pixel: 6: 1> + |pixel: 6: 5>
pixels |noisy: H2> +=> |pixel: 7: 1> + |pixel: 7: 2> + |pixel: 7: 3> + |pixel: 7: 5>
dim-1 |noisy: H2> => |dimension: 5>
dim-2 |noisy: H2> => |dimension: 7>



#####
  #
  #
  #
  #
  #
#####

pixels |letter: I> => |pixel: 1: 1> + |pixel: 1: 2> + |pixel: 1: 3> + |pixel: 1: 4> + |pixel: 1: 5>
pixels |letter: I> +=> |pixel: 2: 3>
pixels |letter: I> +=> |pixel: 3: 3>
pixels |letter: I> +=> |pixel: 4: 3>
pixels |letter: I> +=> |pixel: 5: 3>
pixels |letter: I> +=> |pixel: 6: 3>
pixels |letter: I> +=> |pixel: 7: 1> + |pixel: 7: 2> + |pixel: 7: 3> + |pixel: 7: 4> + |pixel: 7: 5>
dim-1 |letter: I> => |dimension: 5>
dim-2 |letter: I> => |dimension: 7>



####
  #


  #
  #
# ###

pixels |noisy: I> => |pixel: 1: 1> + |pixel: 1: 2> + |pixel: 1: 3> + |pixel: 1: 4>
pixels |noisy: I> +=> |pixel: 2: 3>
pixels |noisy: I> +=> |>
pixels |noisy: I> +=> |>
pixels |noisy: I> +=> |pixel: 5: 3>
pixels |noisy: I> +=> |pixel: 6: 3>
pixels |noisy: I> +=> |pixel: 7: 1> + |pixel: 7: 3> + |pixel: 7: 4> + |pixel: 7: 5>
dim-1 |noisy: I> => |dimension: 5>
dim-2 |noisy: I> => |dimension: 7>


##  #
 ###
  #
  #
  ###
####
#####

pixels |noisy: I2> => |pixel: 1: 1> + |pixel: 1: 2> + |pixel: 1: 5>
pixels |noisy: I2> +=> |pixel: 2: 2> + |pixel: 2: 3> + |pixel: 2: 4>
pixels |noisy: I2> +=> |pixel: 3: 3>
pixels |noisy: I2> +=> |pixel: 4: 3>
pixels |noisy: I2> +=> |pixel: 5: 3> + |pixel: 5: 4> + |pixel: 5: 5>
pixels |noisy: I2> +=> |pixel: 6: 1> + |pixel: 6: 2> + |pixel: 6: 3> + |pixel: 6: 4>
pixels |noisy: I2> +=> |pixel: 7: 1> + |pixel: 7: 2> + |pixel: 7: 3> + |pixel: 7: 4> + |pixel: 7: 5>
dim-1 |noisy: I2> => |dimension: 5>
dim-2 |noisy: I2> => |dimension: 7>


-- OK. I wrote some code to automate this somewhat.
-- See: play_with_pat_rec.py
-- Now I only have to do the ascii art, and the code spits out the rules.
-- string = ("######\n"
--           "#    #\n"
--           "#    #\n"
--           "#    #\n"
--           "#    #\n"
--           "#    #\n"
--           "######")
-- create_pixel_rules("letter: O",string)


######
#    #
#    #
#    #
#    #
#    #
######
pixels |letter: O> +=> |pixel: 1: 1> + |pixel: 1: 2> + |pixel: 1: 3> + |pixel: 1: 4> + |pixel: 1: 5> + |pixel: 1: 6>
pixels |letter: O> +=> |pixel: 2: 1> + |pixel: 2: 6>
pixels |letter: O> +=> |pixel: 3: 1> + |pixel: 3: 6>
pixels |letter: O> +=> |pixel: 4: 1> + |pixel: 4: 6>
pixels |letter: O> +=> |pixel: 5: 1> + |pixel: 5: 6>
pixels |letter: O> +=> |pixel: 6: 1> + |pixel: 6: 6>
pixels |letter: O> +=> |pixel: 7: 1> + |pixel: 7: 2> + |pixel: 7: 3> + |pixel: 7: 4> + |pixel: 7: 5> + |pixel: 7: 6>
dim-1 |letter: O> => |dimension: 6>
dim-2 |letter: O> => |dimension: 7>

Now, in the console:

sa: load H-I-pat-rec.sw
sa: 100 similar[pixels] |letter: H>
82.353|noisy: H> + 76.190|noisy: H2> + 40.909|letter: O> + 35.000|noisy: I2> + 29.412|letter: I> + 17.647|noisy: I>

sa: 100 similar[pixels] |letter: I>
73.333|noisy: I> + 65.000|noisy: I2> + 45.455|letter: O> + 38.095|noisy: H2> + 29.412|letter: H> + 26.667|noisy: H>

Now, let's build the similarity matrix:

sa: |list> => |letter: H> + |noisy: H> + |noisy: H2> + |letter: I> + |noisy: I> + |noisy: I2> + |letter: O>
sa: find-simm |*> #=> 100 (|_self> + similar[pixels] |_self>)
sa: map[find-simm,similarity] "" |list>
sa: matrix[similarity]
[ letter: H ] = [  100.00  29.41   40.91   82.35   76.19   17.65   35.00   ] [ letter: H ]
[ letter: I ]   [  29.41   100.00  45.45   26.67   38.10   73.33   65.00   ] [ letter: I ]
[ letter: O ]   [  40.91   45.45   100.00  36.36   50.00   36.36   40.91   ] [ letter: O ]
[ noisy: H  ]   [  82.35   26.67   36.36   100.00  61.90   14.29   25.00   ] [ noisy: H  ]
[ noisy: H2 ]   [  76.19   38.10   50.00   61.90   100.00  19.05   47.62   ] [ noisy: H2 ]
[ noisy: I  ]   [  17.65   73.33   36.36   14.29   19.05   100.00  45.00   ] [ noisy: I  ]
[ noisy: I2 ]   [  35.00   65.00   40.91   25.00   47.62   45.00   100.00  ] [ noisy: I2 ]

another big similarity example -- pattern recognition of webpages:

So, we need some mapping from webpage html to superpositions.
The one I decided on was splitting/fragmenting documents at certain characters (for html I chose < and >), taking the hash of each fragment, keeping only the last 4 chars of that hash, say, then add the kets up. The advantage of this is that any localized changes in the document do not affect fragments elsewhere in the document.
The code here, the sw file here and the make plot code here.
And now the results, first as graphs, then as similarity matrices (each webpage was downloaded on three consecutive days):

abc.net.au:

adelaidenow.com.au:

slashdot.org:

smh.com.au:

youtube.com:

Now, let's build the similarity matrix:
First, extract the list of kets from fragment-webpages-64k.sw

$ grep "supported" sw-examples/fragment-webpages-64k.sw  | sort | sed 's/=>.*$//g' | sed 's/supported-ops / + /g' | tr -d '\n'
 + |abc-1-64k>  + |abc-2-64k>  + |abc-3-64k>  + |adelaidenow-1-64k>  + |adelaidenow-2-64k>  + |adelaidenow-3-64k>  + |slashdot-1-64k>  + |slashdot-2-64k>  + |slashdot-3-64k>  + |smh-1-64k>  + |smh-2-64k>  + |smh-3-64k>  + |youtube-1-64k>  + |youtube-2-64k>  + |youtube-3-64k>

Now, in the console:

sa: load fragment-webpages-64k.sw
sa: |list> => |abc-1-64k>  + |abc-2-64k>  + |abc-3-64k>  + |adelaidenow-1-64k>  + |adelaidenow-2-64k>  + |adelaidenow-3-64k>  + |slashdot-1-64k>  + |slashdot-2-64k>  + |slashdot-3-64k>  + |smh-1-64k>  + |smh-2-64k>  + |smh-3-64k>  + |youtube-1-64k>  + |youtube-2-64k>  + |youtube-3-64k> 
sa: simm-0 |*> #=> 100 (|_self> + similar[hash-64k] |_self>)
sa: map[simm-0,similarity-0] "" |list>
sa: matrix[similarity-0]
[ abc-1-64k         ] = [  100.00  91.52   91.13   29.84   30.17   29.96   27.64   27.57   27.48   38.36   38.93   38.67   21.00   21.04   20.98   ] [ abc-1-64k         ]
[ abc-2-64k         ]   [  91.52   100.00  91.90   29.83   30.19   29.97   27.62   27.66   27.63   38.43   39.01   38.75   21.11   21.17   21.07   ] [ abc-2-64k         ]
[ abc-3-64k         ]   [  91.13   91.90   100.00  30.05   30.38   30.19   27.74   27.77   27.63   38.41   39.05   38.77   21.12   21.18   21.12   ] [ abc-3-64k         ]
[ adelaidenow-1-64k ]   [  29.84   29.83   30.05   100.00  78.88   78.11   27.26   27.00   27.16   30.55   30.31   30.36   26.91   26.96   27.06   ] [ adelaidenow-1-64k ]
[ adelaidenow-2-64k ]   [  30.17   30.19   30.38   78.88   100.00  83.22   27.37   27.28   27.34   30.46   30.31   30.36   26.97   26.97   27.00   ] [ adelaidenow-2-64k ]
[ adelaidenow-3-64k ]   [  29.96   29.97   30.19   78.11   83.22   100.00  27.17   27.05   27.11   30.28   30.16   30.16   26.96   26.84   26.98   ] [ adelaidenow-3-64k ]
[ slashdot-1-64k    ]   [  27.64   27.62   27.74   27.26   27.37   27.17   100.00  77.37   77.25   29.76   29.67   29.75   22.16   22.14   22.17   ] [ slashdot-1-64k    ]
[ slashdot-2-64k    ]   [  27.57   27.66   27.77   27.00   27.28   27.05   77.37   100.00  78.52   29.55   29.53   29.60   21.71   21.64   21.73   ] [ slashdot-2-64k    ]
[ slashdot-3-64k    ]   [  27.48   27.63   27.63   27.16   27.34   27.11   77.25   78.52   100.00  29.82   29.70   29.87   21.98   21.99   22.00   ] [ slashdot-3-64k    ]
[ smh-1-64k         ]   [  38.36   38.43   38.41   30.55   30.46   30.28   29.76   29.55   29.82   100.00  84.04   85.28   22.45   22.47   22.58   ] [ smh-1-64k         ]
[ smh-2-64k         ]   [  38.93   39.01   39.05   30.31   30.31   30.16   29.67   29.53   29.70   84.04   100.00  85.80   22.18   22.17   22.23   ] [ smh-2-64k         ]
[ smh-3-64k         ]   [  38.67   38.75   38.77   30.36   30.36   30.16   29.75   29.60   29.87   85.28   85.80   100.00  22.22   22.27   22.24   ] [ smh-3-64k         ]
[ youtube-1-64k     ]   [  21.00   21.11   21.12   26.91   26.97   26.96   22.16   21.71   21.98   22.45   22.18   22.22   100.00  90.16   90.12   ] [ youtube-1-64k     ]
[ youtube-2-64k     ]   [  21.04   21.17   21.18   26.96   26.97   26.84   22.14   21.64   21.99   22.47   22.17   22.27   90.16   100.00  90.67   ] [ youtube-2-64k     ]
[ youtube-3-64k     ]   [  20.98   21.07   21.12   27.06   27.00   26.98   22.17   21.73   22.00   22.58   22.23   22.24   90.12   90.67   100.00  ] [ youtube-3-64k     ]

-- now the drop-6-hash similarity matrix: 
sa: simm-6 |*> #=> 100 (|_self> + similar[drop-6-hash] |_self>)
sa: map[simm-6,similarity-6] "" |list>
sa: matrix[similarity-6]
[ abc-1-64k         ] = [  100.00  98.83   99.01   46.64   47.12   46.82   43.93   43.94   44.16   55.96   57.22   56.79   28.24   28.21   28.23   ] [ abc-1-64k         ]
[ abc-2-64k         ]   [  98.83   100.00  99.10   46.72   47.20   46.90   44.67   44.68   44.90   55.97   57.23   56.81   28.25   28.22   28.25   ] [ abc-2-64k         ]
[ abc-3-64k         ]   [  99.01   99.10   100.00  46.78   47.26   46.95   44.60   44.60   44.82   55.94   57.21   56.78   28.23   28.19   28.22   ] [ abc-3-64k         ]
[ adelaidenow-1-64k ]   [  46.64   46.72   46.78   100.00  96.61   96.63   41.59   41.25   41.37   42.60   42.16   42.28   33.46   33.43   33.46   ] [ adelaidenow-1-64k ]
[ adelaidenow-2-64k ]   [  47.12   47.20   47.26   96.61   100.00  97.04   42.01   41.67   41.79   42.79   42.24   42.46   33.49   33.46   33.54   ] [ adelaidenow-2-64k ]
[ adelaidenow-3-64k ]   [  46.82   46.90   46.95   96.63   97.04   100.00  41.66   41.32   41.44   42.56   42.12   42.23   33.52   33.49   33.52   ] [ adelaidenow-3-64k ]
[ slashdot-1-64k    ]   [  43.93   44.67   44.60   41.59   42.01   41.66   100.00  97.49   97.25   44.48   44.24   44.43   34.84   34.83   34.83   ] [ slashdot-1-64k    ]
[ slashdot-2-64k    ]   [  43.94   44.68   44.60   41.25   41.67   41.32   97.49   100.00  99.18   44.44   44.20   44.39   34.45   34.45   34.45   ] [ slashdot-2-64k    ]
[ slashdot-3-64k    ]   [  44.16   44.90   44.82   41.37   41.79   41.44   97.25   99.18   100.00  44.61   44.38   44.57   34.71   34.70   34.70   ] [ slashdot-3-64k    ]
[ smh-1-64k         ]   [  55.96   55.97   55.94   42.60   42.79   42.56   44.48   44.44   44.61   100.00  95.96   97.42   28.85   28.83   28.81   ] [ smh-1-64k         ]
[ smh-2-64k         ]   [  57.22   57.23   57.21   42.16   42.24   42.12   44.24   44.20   44.38   95.96   100.00  97.31   28.48   28.47   28.44   ] [ smh-2-64k         ]
[ smh-3-64k         ]   [  56.79   56.81   56.78   42.28   42.46   42.23   44.43   44.39   44.57   97.42   97.31   100.00  28.58   28.57   28.54   ] [ smh-3-64k         ]
[ youtube-1-64k     ]   [  28.24   28.25   28.23   33.46   33.49   33.52   34.84   34.45   34.71   28.85   28.48   28.58   100.00  97.61   97.97   ] [ youtube-1-64k     ]
[ youtube-2-64k     ]   [  28.21   28.22   28.19   33.43   33.46   33.49   34.83   34.45   34.70   28.83   28.47   28.57   97.61   100.00  97.16   ] [ youtube-2-64k     ]
[ youtube-3-64k     ]   [  28.23   28.25   28.22   33.46   33.54   33.52   34.83   34.45   34.70   28.81   28.44   28.54   97.97   97.16   100.00  ] [ youtube-3-64k     ]

pattern recognition claim:

we can make a lot of progress in pattern recognition if we can find mappings from objects to well-behaved, deterministic, distinctive superpositions.

where:

well-behaved means similar objects return similar superpositions (this is the hard bit to achieve, but hopefully not impossible)
deterministic means if you feed in the same object, you get essentially the same superposition. There is some lee-way in that it doesn't have to be 100% identical on each run, but close.
distinctive means different object types have easily distinguishable superpositions (again, this is on the hard side)

Bridging sets:

Let's define them:

First, we say x is near y if metric[x,y] <= t for a metric of your choice, and some threshold t.

Then a linear bridging set is a set {x0,x1,x2,x3,...,xn} such that:
1) x_k is near x_k+1, for all k in {0,1,...,n}
2) x_0 is not near x_n

A general bridging set is a set {x0,x1,x2,x3,...,xn} such that:
1) for every j in {0,1,...,n}, x_j is near an x_k for some k != j in {0,1,...,n}. -- ie, every element in the set is near some other element in the set
2) there may exist j,k pairs such that x_j is not near x_k

Some briding sets:

where A is a linear bridging set, and B and C are general bridging sets.

OK. So what is the point of mentioning brdiging sets? Well, it motivates the categorize code.
Categorize maps a set of elements into disjoint bridging sets.
Here is the result for the H-I-O data:

sa: load H-I-pat-rec.sw
sa: categorize[pixels,0.6,result]
sa: dump |result>
category-0 |result> => |letter: H> + |noisy: H> + |noisy: H2>
category-1 |result> => |letter: I> + |noisy: I> + |noisy: I2>
category-2 |result> => |letter: O>

Here is the result for webpage fragments:

sa: load fragment-webpages-64k.sw
sa: categorize[hash-64k,0.7,result]
sa: dump |result>
category-0 |result> => |youtube-3-64k> + |youtube-2-64k> + |youtube-1-64k>
category-1 |result> => |smh-3-64k> + |smh-2-64k> + |smh-1-64k>
category-2 |result> => |abc-1-64k> + |abc-2-64k> + |abc-3-64k>
category-3 |result> => |slashdot-3-64k> + |slashdot-1-64k> + |slashdot-2-64k>
category-4 |result> => |adelaidenow-3-64k> + |adelaidenow-2-64k> + |adelaidenow-1-64k>

active-buffer:

So, what is an active buffer?
It contains a buffer of the latest input, then tries to find matching patterns with known objects.
Hopefully some examples will help explain:

-- load some data:
sa: load internet-acronyms.sw

-- read a short sentence:
sa: read |text: WTF is going on OMg thx RTFM!>
|word: wtf> + |word: is> + |word: going> + |word: on> + |word: omg> + |word: thx> + |word: rtfm>

-- apply active buffer to this:
sa: active-buffer[7,0] read |text: WTF is going on OMg thx RTFM!>
2.593|phrase: What The Fuck> + 4.826|phrase: Oh My God> + 4.043|phrase: Thanks> + 2.593|phrase: Read the Fine Manual> + 2.593|phrase: Read the Fucking Manual>

Breakfast menu example (an sw form of this xml version):

sa: load some data:
sa: load next-breakfast-menu.sw 
sa: description |food: Homestyle Breakfast>
|text: "Two eggs, bacon or sausage, toast, and our ever-popular hash browns">

-- read the description for Homestyle Breakfast
sa: read description |food: Homestyle Breakfast>
|word: two> + |word: eggs> + |word: bacon> + |word: or> + |word: sausage> + |word: toast> + |word: and> + |word: our> + |word: ever-popular> + |word: hash> + |word: browns>

-- apply active-buffer to this:
sa: active-buffer[7,1] read description |food: Homestyle Breakfast>
|number: 2> + |food: eggs> + |food: bacon> + |food: sausage> + |food: toast> + |food: hash browns>

Fred/Mary/pregnancy example:

-- first load up some knowledge:
sa: |person: Fred Smith> => |word: fred> + |word: freddie> + |word: simth> + |word: smithie>  -- various names and nick-names
sa: |person: Mary> => |word: mazza>                                                           -- just a nick-name
sa: |greeting: Hey!> => |word: hey>
sa: |question: what is> => |word: what's>
sa: |direction: up> => |word: up>
sa: |phrase: having a baby> => read |text: having a baby>
sa: |phrase: in the family way> => read |text: in the family way>
sa: |phrase: up the duff> => read |text: up the duff>
sa: |phrase: with child> => read |text: with child>
sa: |concept: pregnancy> => |phrase: having a baby> + |phrase: in the family way> + |phrase: up the duff> + |phrase: with child>

-- save a copy:
sa: save active-buffer-play.sw

-- now start playing with it:
sa: active-buffer[7,0] read |text: Hey Freddie what's up?>  
2.083|greeting: Hey!> + 1.500|person: Fred Smith> + 2.917|question: what is> + 2.083|direction: up> + 1.250|phrase: up the duff>
-- up the duff is in there because of the word "up"

-- now test phrase matching a concept, in this case phrases that mean pregnant.
sa: active-buffer[7,0] read |text: Hey Mazza, you with child, up the duff, in the family way, having a baby?>
2.593|greeting: Hey!> + 4.186|person: Mary> + 11.586|phrase: with child> + 6.857|direction: up> + 23.414|phrase: up the duff> + 25.000|phrase: in the family way> + 9.224|phrase: having a baby>

-- one more layer of active-buffer:
sa: active-buffer[7,0] active-buffer[7,0] read |text: Hey Mazza, you with child, up the duff, in the family way, having a baby?>
11.069|concept: pregnancy>

Recognize a face, even if you only see part of it:

-- define the features of a face:
sa: |body part: face> => 2|eyes> + |nose> + 2|ears> + 2|lips> + |hair>

-- even if we only have a couple of pieces of the face, we can still recognize it as a face
sa: active-buffer[7,0] (2|eyes> + |nose>)
0.750|body part: face>

-- again, hair, nose and lips is enough to say we have a face:
sa: active-buffer[7,0] (|hair> + |nose> + |lips>)
1.625|body part: face>

-- this time we see all parts of the face:
sa: active-buffer[7,0] (2|eyes> + |nose> + 2|ears> + 2|lips> + |hair>)
7.125|body part: face>

NFC: the normed frequency class equation:

First, we start with the frequency class equation as defined on this wikipedia page.
In python this looks like:

def frequency_class(e,X):
  X = X.drop()              # filter out elements <= 0
  smallest = X.find_min_coeff()
  largest = X.find_max_coeff()
  f = X.find_value(e)

  if largest <= 0:
    return 1
  if f <= 0: 
    return math.floor(0.5 - math.log(smallest/largest,2)) + 1
  return math.floor(0.5 - math.log(f/largest,2))

Now, normalize this to [0,1], so that now:

0 means e is not in the frequency list X
1 means e has the max coeff of elements in X
(0,1) means e is somewhere in between.

Esentially, NFC is a fuzzy set membership function.
Here is the python:

def normed_frequency_class(e,X):
  X = X.drop()                                 # drop elements with coeff <= 0
  smallest = X.find_min_coeff()                # return the min coeff in X as float
  largest = X.find_max_coeff()                 # return the max coeff in X as float
  f = X.find_value(e)                          # return the value of ket e in superposition X as float

  if largest <= 0 or f <= 0:                   # otherwise the math.log() blows up!
    return 0

  fc_max = math.floor(0.5 - math.log(smallest/largest,2)) + 1  # NB: the + 1 is important, else the smallest element in X gets reported as not in set.
  return 1 - math.floor(0.5 - math.log(f/largest,2))/fc_max

diagrams ...

Using NFC to find topic:

Again, let's give some examples:

sa: load names.sw
sa: find-topic[names] |fred>
63.294|male name> + 28.418|last name> + 8.288|female name>

sa: find-topic[names] |emma>
90.323|female name> + 9.677|last name>

sa: find-topic[names] |george>
45.902|male name> + 40.073|last name> + 14.026|female name>

sa: find-topic[names] |simon>
63.871|last name> + 36.129|male name>

-- now apply this to a handful of wikipedia pages we have converted to word frequency lists:
-- (if we had the computational power it would be nice to try this on all of the English wikipedia)
sa: load WP-word-frequencies.sw
-- "wikipedia" is in NFC 1 for all our frequency lists:
sa: find-topic[words] |wikipedia>
18.539|WP: US presidents> + 18.539|WP: particle physics> + 16.479|WP: rivers> + 16.479|WP: physics> + 16.479|WP: country list> + 13.483|WP: Australia>

sa: find-topic[words] |adelaide>
74.576|WP: Adelaide> + 25.424|WP: Australia>

sa: find-topic[words] |sydney>
60.241|WP: Australia> + 39.759|WP: Adelaide>

sa: find-topic[words] |canberra>
100.000|WP: Australia>

sa: find-topic[words-2] |aami stadium>
100.000|WP: Adelaide>

sa: find-topic[words-2] |river torrens>
100.000|WP: Adelaide>

sa: find-topic[words-2] |river nile> 
100.000|WP: rivers>                  

sa: find-topic[words-3] |university of adelaide>                
76.923|WP: Adelaide> + 23.077|WP: Australia>      

sa: find-topic[words] |physics>
54.237|WP: physics> + 45.763|WP: particle physics>

sa: find-topic[words-2] |particle physics>
60.000|WP: particle physics> + 40.000|WP: physics>            

sa: find-topic[words] |electron>
62.791|WP: particle physics> + 37.209|WP: physics>

sa: find-topic[words-2] |bill clinton>
100.000|WP: US presidents>

sa: find-topic[words-2] |george bush>                             -- no match on the exact phrase.
|>                                                                -- probably because of the need to disambiguate between father and son.

sa: find-topic[words] (|george> + |bush>)
67.705|WP: US presidents> + 22.363|WP: Australia> + 9.932|WP: Adelaide>

sa: find-topic[words-2] |richard nixon>
100.000|WP: US presidents>

sa: find-topic[words-2] |thomas jefferson>
100.000|WP: US presidents>

sa: find-topic[words] |reagan>
100.000|WP: US presidents>

sa: find-topic[words] (|australia> + |austria> + |brazil> + |chile> + |denmark> + |holland> + |germany> + |france> + |japan> + |italy> + |greece>)
38.242|WP: country list> + 24.433|WP: rivers> + 19.765|WP: Australia> + 10.494|WP: Adelaide> + 3.931|WP: particle physics> + 3.135|WP: physics>

sigmoids:

What are sigmoids? They are a collection of functions that only change the coeffs in superpositions.
Here are a couple of examples:

|f> => absolute-noise[3] 0 range(|x: 0>,|x: 255>)

binary-filter "" |f>

threshold-filter[1] "" |f>

threshold-filter[2] "" |f>

threshold-filter[2.5] "" |f>

six degrees of separation:

-- first, set the context:
sa: context fred friends

-- let's load up some data:
sa: friends |Fred> => |Sam> + |Harry> + |Liz> + |Rob> + |Emma> + |Mazza>
sa: friends |Sam> => |Eric> + |Smithie> + |Patrick> + |Tom>
sa: friends |Harry> => |Jane> + |Sarah> + |Jean> + |Alicia>

-- let's take a quick play with this data:
-- who are Fred's friends?
sa: friends |Fred>
|Sam> + |Harry> + |Liz> + |Rob> + |Emma> + |Mazza>

-- how many friends does Fred have?
sa: count friends |Fred>
|number: 6>

-- who are Fred's friends of friends? 
sa: friends friends |Fred>
|Eric> + |Smithie> + |Patrick> + |Tom> + |Jane> + |Sarah> + |Jean> + |Alicia>

-- how many are there:
sa: count friends^2 |Fred>
|number: 8>

-- six degrees of separation looks like this (if we had the data):
friends^6 |Fred>

-- or, alternatively, one of these:
sa: exp[friends,6] |Fred>
sa: exp-max[friends] |Fred>

Kevin Bacon numbers:

Is as simple as this:

kevin-bacon-0 |result> => |actor: Kevin (I) Bacon>
kevin-bacon-1 |result> => actors movies |actor: Kevin (I) Bacon>
kevin-bacon-2 |result> => actors movies actors movies |actor: Kevin (I) Bacon>

Of course, this assumes a sufficiently powerful computer.
kevin-bacon-1 took 1 minite on Amazon's r3.large EC2.
kevin-bacon-2 took 12 days (though this is due to the current addition of superpositions being O(n^2), which I eventually plan to fix using ordered dictionaries)
The results are here (NB: I didn't filter out TV shows well enough from the raw imdb data, so the results aren't perfect)

The general case is:

kevin-bacon-k |result> => [actors movies]^k |actor: Kevin (I) Bacon>

Some related maths:

d(X,Y,op) = <Y|op|X>
measures/counts the number of pathways from |X> to |Y> via op (fix! Only sometimes true. eg in the Kevin Bacon case)

Some wild? speculation

map each neuron to a ket
map firing frequency to the coeff of that ket
measure neuron firing value using bras
synapses + neurotransmitters roughly map to sigmoids

Perhaps something like this:

where big circles are neurons, little circles are synapses, and lines are axons and dendrites.

My vision of the future:

complete, efficient, open source implementation of the BKO scheme
a swc (semantic web code) language for when BKO is not quite enough (though I'm not yet sure what form it will take)
an internet scale semantic web
some scheme to allow others to write and share semantic-db functions

Appendix:

this document can be found here
the full sw example collection (some are junk used for testing, some are interesting)
the historical journal documenting all of my work (probably not going to make a lot of sense)
the code: code, functions, processor, console
link to some imdb results

the console help file:

sa: h

  q, quit, exit                quit the agent.
  h, help                      print this message
  context                      print list of context's
  context string               set current context to string
  reset                        reset back to completely empty console
                               Warning! you will lose all unsaved work!
  dump                         print current context
  dump exact                   print current context in exact mode
  dump multi                   print context list
  dump self                    print what we know about the default ket/sp
  dump ket/sp                  print what we know about the given ket/sp
  display                      (relatively) readable display of current context
  display ket/sp               (relatively) readable display about what we know for the ket/sp
  freq                         convert current context to frequency list
  mfreq                        convert context list to frequency list
  load file.sw                 load file.sw
  save file.sw                 save current context to file.sw
  save multi file.sw           save context list to file.sw
  files                        show the available .sw files
  cd                           change and create if necessary the .sw directory
  ls, dir, dirs                show the available directories
  create inverse               create inverse for current context
  create multi inverse         create inverse for all context in context list
  x = foo: bah                 set x (the default ket) to |foo: bah>
  id                           display the default ket/superposition
  s, store                     set x to the result of the last computation
  .                            repeat last computation
  i                            interactive history
  history                      show last 30 commands
  history n                    show last n commands
  save history                 save console history to file
  -- comment                   ignore, this is just a comment line.
  if none of the above         process_input_line(C,line,x)

brief description of implemented functions in BKO/Feynman:

Unfinished ... For now, see here (also unfinished)

# Some hash tables mapping ops to the python equivalent.
# Left hand side is BKO language, right is python.

# functions built into ket/superposition classes.
built_in_table = {
  "display"        : "display",
  "transpose"      : "transpose",
# "select-elt"     : "select_elt",
  "pick-elt"       : "pick_elt",
# "find-index"     : "find_index",
# "find-value"     : "find_value",
  "normalize"      : "normalize",
  "rescale"        : "rescale",
# "rescale"        : "rescale",
# "sigmoid"        : "apply_sigmoid",
# "function"       : "apply_fn",
# "similar"        : "similar",
# "collapse-fn"    : "apply_fn_collapse",
  "collapse"       : "collapse",
  "count"          : "number_count",
  "count-sum"      : "number_count_sum",
  "sum"            : "number_count_sum",
  "product"        : "number_product",
  "drop"           : "drop",
# "drop-below"     : "drop_below",
# "drop-above"     : "drop_above",
# "select-range"   : "select_range",
# "delete-elt"     : "delete_elt",
  "reverse"        : "reverse",
  "shuffle"        : "shuffle",
  "coeff-sort"     : "coeff_sort",
  "ket-sort"       : "ket_sort",
  "max-elt"        : "find_max_elt",
  "min-elt"        : "find_min_elt",
  "max"            : "find_max",
  "min"            : "find_min",

# new:
  "discrimination" : "discrimination",
  "discrim"        : "discrimination",

# special:
    "type"         : "type",                    # implemented for debugging purposes.

# new 7/9/2014:
#  "long"           : "long_display",
}                                                                                      

# table of sigmoids:
sigmoid_table = {
  "clean"             : "clean",
# "threshold-filter"  : "threshold_filter",   # we can't handle paramters with our ops yet.
  "binary-filter"     : "binary_filter",
  "not-binary-filter" : "not_binary_filter",
  "pos"               : "pos",
  "NOT"               : "NOT",
  "xor-filter"        : "xor_filter",
# "mult"              : "mult",               # yeah, the sigmoid version works, but moved to compound table 
                                              # ".multiply({0})" Decided it was common enough that it needed to be built in.
  "invert"            : "invert",
}                                   

# some ket -> ket functions:
fn_table = {
  "value"            : "apply_value",
  "extract-category" : "extract_category",
  "extract-value"    : "extract_value",
  "to-number"        : "category_number_to_number",
  "shout"            : "shout",
#  "discrim"           : "discrimination",  # Broken. discrim (3|a> + 9|b>) returns 12| >. Doh! It should be 9 - 3, not 9 + 3.
  "F"                : "to_Fahrenheit",     # should these be to-F, to-C, to-K?
  "C"                : "to_Celsius",
  "K"                : "to_Kelvin",
  "to-km"            : "to_km",
  "to-meter"         : "to_meter",
  "to-mile"          : "to_mile",
  
  "to-value"         : "to_value",
  "to-category"      : "to_category",
  
# 3/6/2014:
  "day-of-the-week"  : "day_of_the_week",

# 23/6/2014:
  "long"             : "long_display",            # BUG! I have no idea why this insists on using the "37|ket>"" instead of "37 ket" notation!
  "split"            : "split_ket",               # ahh.... it is running long_display with respect to kets, not superposition as I expected!
                                                  # maybe shift to another table to fix.
# 29/6/2014:
#  "sp-as-list"       : "sp_as_list",             # Nope! Belongs in the sp_fn_table.
}


# 7/4/2014 me wonders. do fn_table and fn_table2 really need to be separate?  
# some other functions. Some are ket -> ket, some are ket -> superposition.
fn_table2 = {
  "read"              : "read_text",
  "spell"             : "spell_word",
  "factor"            : "factor_numbers",
  "near-number"       : "near_numbers",
  "strange-int"       : "strange_int",
  "is-prime"          : "is_prime",
  "strange-int-prime" : "strange_int_prime",
  "strange-int-depth" : "strange_int_depth",
  "strange-int-delta" : "strange_int_delta",
  "strange-int-list"  : "strange_int_list",
}


# table of compound operators.
# They need to be handled separately from those in the tables above, because they have parameters.
compound_table = {
  "select-elt"         : ".select_elt({0})",
# "find-index"           # can't support these two until we have more advanced parsing.
# "find-value            # eg: find-index[|person: Fred>] |x> currently would split on the space in the ket.
  "normalize"          : ".normalize({0})",
  "rescale"            : ".rescale({0})",
  "similar"            : ".similar(context,\"{0}\")",
  "find-topic"         : ".find_topic(context,\"{0}\")",
# "collapse-function"  : ".apply_fn_collapse({0})",  # broken for now. eg, how handle collapse-fn[spell] |x> ??
  "drop-below"         : ".drop_below({0})",         # Not needed anyway. Just use: collapse spell |x>
  "drop-above"         : ".drop_above({0})",
  "select-range"       : ".select_range({0})",       # may comment this one out, but fine for now to have two versions.
  "select"             : ".select_range({0})",
  "delete-elt"         : ".delete_elt({0})",
  "threshold-filter"   : ".apply_sigmoid(threshold_filter,{0})",
#  "mult"              : ".apply_sigmoid(mult,{0})",  # this is now moved to ket/sp since it is common enough.
  "mult"               : ".multiply({0})",
  "in-range"           : ".apply_sigmoid(in_range,{0})",
  "smooth"             : ".apply_fn_collapse(smooth,{0})",
  "set-to"             : ".apply_sigmoid(set_to,{0})",             

# newly added: 7/4/2014:
  "absolute-noise"     : ".absolute_noise({0})",
  "relative-noise"     : ".relative_noise({0})",
  
# newly added 8/5/2014:
  "common"             : ".apply_sp_fn(common,context,\"{0}\")",   
# newly added 12/5/2014:
  "exp"                : ".apply_sp_fn(exp,context,\"{0}\")",
  
# newly added 19/5/2014
  "relevant-kets"      : ".apply_naked_fn(relevant_kets,context,\"{0}\")",  
#  "matrix"            : ".apply_naked_fn(matrix,context,\"{0}\")",
#  "multi-matrix"      : ".apply_naked_fn(multi_matrix,context,\"{0}\")",

# newly added 21/5/2014:
  "matrix"             : ".apply_naked_fn(multi_matrix,context,\"{0}\")",  # this deprecates/replaces the naked_fn(matrix,...) version.
  "merged-matrix"      : ".apply_naked_fn(merged_multi_matrix,context,\"{0}\")",
  "naked-matrix"       : ".apply_naked_fn(merged_naked_matrix,context,\"{0}\")",

# newly added 22/5/2014:
  "map"                : ".apply_sp_fn(map,context,\"{0}\")",

# newly added 28/5/2014:
  "categorize"         : ".apply_naked_fn(categorize,context,\"{0}\")",  
  
# newly added 5/6/2014:
  "vector"             : ".apply_sp_fn(vector,context,\"{0}\")",
# added 6/6/2014:
  "print-pixels"       : ".apply_sp_fn(print_pixels,context,\"{0}\")",
  
# added 27/6/2014:
  "active-buffer"      : ".apply_sp_fn(console_active_buffer,context,\"{0}\")",
  
# added 28/7/2014:
  "train-of-thought"   : ".apply_sp_fn(console_train_of_thought,context,\"{0}\")",
  
# added 4/8/2014:
  "exp-max"            : ".apply_sp_fn(exp_max,context,\"{0}\")",
  
# added 7/8/2014:
  "sp-propagate"       : ".apply_sp_fn(sp_propagate,context,\"{0}\")",
  "op-propagate"       : ".apply_sp_fn(sp_propagate,context,\"{0}\")",  # an alias  
}



# 7/4/2014: new addition, functions that map sp -> ket/sp
# Pretty sure this breaks the linearity. 
# Ie, the functions here are in general not linear, while most other ops/fns are.
# 30/6/2014: heh. I'd forgotten I had this! 
sp_fn_table = {
  "list-to-words"      : "sp_to_words",
  "read-letters"       : "read_letters",
  "read-words"         : "read_words", 
  "merge-labels"       : "merge_labels",
  "sp-as-list"         : "sp_as_list",
}

# whitelisted functions, that take 2 parameters:
whitelist_table_2 = {
  "intersection"        : "intersection",
  "intn"                : "intersection",
  "common"              : "intersection",          # this is for those that haven't studied maths.
  "union"               : "union",                 # though if they haven't this work won't make much sense anyway!
  "mult"                : "multiply",
  "multiply"            : "multiply",              # for when you are not lazy :)
  "addition"            : "addition",
  "simm"                : "simm",
  "silent-simm"         : "silent_simm",
  "weighted-simm"       : "weighted_simm",          # hrmm... I thought this took 3 parameters, not 2!
  "nfc"                 : "normed_frequency_class",  # pretty unlikely this will be used at command line since needs freq lists.
  "apply"               : "apply",
  "range"               : "show_range",
  "ket-simm"            : "ket_simm",
  "to-base"             : "decimal_to_base",
  "general-to-specific" : "general_to_specific",
}

# whitelisted functions that take 3 parameters:
whitelist_table_3 = {
  "intersection"  : "tri_intersection",
  "intn"          : "tri_intersection",
  "common"        : "tri_intersection",
  "union"         : "tri_union",
  "arithmetic"    : "arithmetic",
  "range"         : "show_range",
  "algebra"       : "algebra",
  "if"            : "bko_if",
  "wsimm"         : "weighted_simm",
  "ket-wsimm"     : "ket_weighted_simm",
}

created: 1/11/2014
updated: 4/11/2014
by Garry Morrison
email: garry -at- semantic-db.org