sw-examples: scompress-wikipedia.sw3
Raw file here.
-- Given the raw text of the simple English wikipedia page for dogs, use scompress[] to extract out repeated substrings:
-- Updated to the version 3.1.1 language.
-- Here is the extracted text for that wikipedia page:
learn-page |dog> #=>
seq |0> => ssplit |Dogs (Canis lupus familiaris) are domesticated mammals, not natural wild animals.>
seq |1> => ssplit |They were originally bred from wolves.>
seq |2> => ssplit |They have been bred by humans for a long time, and were the first animals ever to be domesticated.>
buggy-seq |3> => ssplit |There are different studies that suggest that this happened between 15.000 and 100.000 years before our time.>
seq |3> => ssplit |There are different studies that suggest that this happened between and years before our time.>
seq |4> => ssplit |The dingo is also a dog, but many dingos have become wild animals again and live independently of humans in the range where they occur (parts of Australia).>
seq |5> => ssplit |Today, some dogs are used as pets, others are used to help humans do their work.>
seq |6> => ssplit |They are a popular pet because they are usually playful, friendly, loyal and listen to humans.>
seq |7> => ssplit |Thirty million dogs in the United States are registered as pets.>
seq |8> => ssplit |Dogs eat both meat and vegetables, often mixed together and sold in stores as dog food.>
seq |9> => ssplit |Dogs often have jobs, including as police dogs, army dogs, assistance dogs, fire dogs, messenger dogs, hunting dogs, herding dogs, or rescue dogs.>
seq |10> => ssplit |They are sometimes called "canines" from the Latin word for dog - canis.>
seq |11> => ssplit |Sometimes people also use "dog" to describe other canids, such as wolves.>
seq |12> => ssplit |A baby dog is called a pup or puppy.>
seq |13> => ssplit |A dog is called a puppy until it is about one year old.>
seq |14> => ssplit |Dogs are sometimes referred to as "man's best friend" because they are kept as domestic pets and are usually loyal and like being around humans.>
seq |15> => ssplit |Dogs like to be petted, but only when they can first see the petter's hand before petting; one should never pet a dog from behind.>
seq |16> => ssplit |Dogs have four legs and make a "bark," "woof," or "arf" sound.>
seq |17> => ssplit |Dogs often chase cats, and most dogs will fetch a ball or stick.>
seq |18> => ssplit |Dogs can smell and hear better than humans, but cannot see well in color because they are color blind.>
seq |19> => ssplit |Due to the anatomy of the eye, dogs can see better in dim light than humans.>
seq |20> => ssplit |They also have a wider field of vision.>
seq |21> => ssplit |Like wolves, wild dogs travel in groups called packs.>
seq |22> => ssplit |Packs of dogs are ordered by rank, and dogs with low rank will submit to other dogs with higher rank.>
seq |23> => ssplit |The highest ranked dog is called the alpha male.>
seq |24> => ssplit |A dog in a group helps and cares for others.>
seq |25> => ssplit |Domesticated dogs often view their owner as the alpha male.>
seq |26> => ssplit |Different dog breeds have different lifespans.>
seq |27> => ssplit |In general, smaller dogs live longer than bigger ones.>
seq |28> => ssplit |The size and the breed of the dog change how long the dog lives, on average.>
seq |29> => ssplit |Breeds such as the Dachshund usually live for fifteen years, Chihuahuas can reach age twenty.>
seq |30> => ssplit |The Great Dane, on the other hand has an average lifespan of six to eight years; some Great Danes have lived for ten years.>
seq |31> => ssplit |All dogs are descended from wolves, by domestication and artificial selection.>
seq |32> => ssplit |This is known because DNA genome analysis has been done to discover this.>
seq |33> => ssplit |They have been bred by humans.>
seq |34> => ssplit |The earliest known fossil of a domestic dog is from years ago in Belgium.>
seq |35> => ssplit |Dogs have lived with people for at least years.>
seq |36> => ssplit |In, a study was published that showed that the skull and teeth of a canid, dated to years ago, had characteristics closer to a dog than to a wolf, and the authors conclude that "this specimen may represent a dog in the very early stages of domestication, i.e. an “incipient” dog.>
seq |37> => ssplit |The researchers go on to suggest that it was, however, a line that did not lead to modern dogs.>
seq |38> => ssplit |Genetically, this material is closer to that of a modern dog than to that of a wolf.>
seq |39> => ssplit |Other signs of domestication are that sometimes, dogs were buried together with humans.>
seq |40> => ssplit |Evidence of this is a tomb in Bonn, where a man of about years of age, a woman of about years of age, the remains of a dog, plus other artifacts were found.>
seq |41> => ssplit |Radiocarbon dating showed that the human bones were between and years old.>
seq |42> => ssplit |Dogs are often called "man's best friend" because they fit in with human life.>
seq |43> => ssplit |Man refers to humankind and not just guys (Old English).>
seq |44> => ssplit |Dogs can serve people in many ways.>
seq |45> => ssplit |For example, there are guard dogs, hunting dogs, herding dogs, guide dogs for blind people, and police dogs.>
seq |46> => ssplit |There are also dogs that are trained to smell for diseases in the human body or to find bombs or illegal drugs.>
seq |47> => ssplit |These dogs sometimes help police in airports or other areas.>
seq |48> => ssplit |Sniffer dogs (usually beagles) are sometimes trained for this job.>
seq |49> => ssplit |Dogs have even been sent by Russians into outer space, a few years before any human being.>
seq |50> => ssplit |The first dog sent up was named Laika, but she died within a few hours.>
seq |51> => ssplit |There are at least breeds (kinds) of dogs.>
seq |52> => ssplit |Dogs whose parents were the same breed will also be that breed these dogs are called purebred or pure pedigree dogs.>
seq |53> => ssplit |Dogs with parents from different breeds no longer belong to one breed they are called mutts, mixed-breed dogs, hybrids, or mongrels.>
seq |54> => ssplit |Some of the most popular breeds are sheepdogs, collies, poodles and retrievers.>
seq |55> => ssplit |It is becoming popular to breed together two different breeds of dogs and call the new dog's breed a name that is a mixture of the parents' breeds' two names.>
seq |56> => ssplit |A puppy with a poodle and a pomeranian as parents might be called a Pomapoo.>
seq |57> => ssplit |These kinds of dogs, instead of being called mutts, are known as designer dog breeds.>
seq |58> => ssplit |These dogs are normally used for prize shows and designer shows.>
seq |59> => ssplit |They can be guide dogs.>
|>
-- Let's learn all of those sequences of letters:
learn-page |dog>
-- It works better if the raw text is lower case, so let's learn the lower case versions of the sequences:
convert-to-lower-case |*> #=>
lower-seq |__self> => to-lower seq |__self>
|>
convert-to-lower-case rel-kets[seq]
-- Let's do the hard work, and apply the scompress[] operator:
-- scompress[seq, cseq, "W: "]
-- scompress[lower-seq, cseq, "W: "]
scompress[lower-seq, cseq, "W: ", 6, 40]
-- The code to find the repeated substring patterns:
filter-W |W: *> #=> |_self>
expand-W |W: *> #=> smerge cseq^20 |_self>
find |repeat patterns> #=> seq2sp expand-W cseq rel-kets[lower-seq] |>
print-coeff |*> #=>
print (extract-value push-float |__self> _ |:> __ |__self>)
|>
print-minimalist |*> #=>
print |__self>
|>
-- print-coeff reverse sort-by[ket-length] find |repeat patterns>
print-minimalist reverse sort-by[ket-length] find |repeat patterns>
-- The operators needed to find the system depth:
-- find-depth (*) #=>
-- depth |system> => plus[1] depth |system>
-- op-if( is-equal(|__self>, the |input>), |op: display-depth>, |op: find-depth>) cseq |__self>
--
-- display-depth (*) #=>
-- |system depth:> __ depth |system>
find-system-depth |*> #=>
depth |system> => |0>
the |input> => lower-seq |__self>
find-depth cseq |__self>
find-depth (*) #=>
depth |system> => plus[1] depth |system>
if( |__self> != the |input> ):
find-depth cseq |__self>
end:
|system depth:> __ depth |system>
-- Now, let's find the system depths for our sequences:
coeff-sort find-system-depth rel-kets[lower-seq]
Home