dupa

Author Topic: Don...  (Read 3966 times)

Larry

  • Sr. Member
  • ****
  • Posts: 266
    • View Profile
Don...
« on: October 13, 2003, 08:44:48 pm »
Don...
Is there a file with all the 'keywords' listed in it??? The question and answer brains use those keywords from the original sentence...

I've written a VB that reads the TopicFocus.brn and reads any text file and puts each sentence in the focus files according to TopicFocus... but now I'm at the part where I need that second line - which is those keywords... I can see where I can get most of them from other focus files... but it would be nice to have the master list to be sure to get them all...

Or should I be asking Robert???

 

Don Ferguson

  • Sr. Member
  • ****
  • Posts: 303
    • View Profile
    • http://www.cortrapar.com
Don...
« Reply #1 on: October 14, 2003, 02:25:05 am »
Hi Larry,

Your question is about the "extract keywords" function that Hal uses to omit common words from the all-caps "second line entry" in the Q&A brain database files.

The "extract keyword" function makes a call outside the script to a .dll, so I don't have a copy of the list of words that it omits.

HOWEVER, it isn't necessary to "extract keywords" to put a usable "second line entry."  If you merely make the second line alphanumeric-only and all caps, (and of course in the proper position), it will work.

My understanding from Robert Medeksza is that the "extract keywords" function was created so that Hal could evaluate trigger sentences for relevance without false-triggering on extremely common short words such as "the" and "a" and "an" and so forth.  The Q & A brain does a complex calculation for relevance based on the number of matching words, the number of matching words in the correct sequence, and the lengths of the respective sentences.

My own experimentation suggests that "extract keywords" performs a valuable function when a database is thinly populated.  However, when a database becomes larger ("heavily populated"), the concern about false-triggering decreases (because there's usually a decent-relevance sentence somewhere in the database anyway).  When the database is heavily populated, the inclusion of minor words could theoretically increase the precision and discrimination of Hal's responses!

There's a long-term possibility that the Q&A brain function could be re-written to include 100% of the trigger sentence, and evaluate all words, maybe even with heavier-weighting for nouns and verbs, lighter weighting for adjectives and adverbs, and the least weighting for articles and prepositions.  Robert Medeksza has mentioned this, and it sounds very useful, but it also sounds like a big, big programming job!

At present, I believe that the .dll just ignores the words it regards as irrelevant, so the only thing you're sacrificing by putting an entire raw sentence into the all-caps "second trigger line" is a very small penalty in space and speed.

By the way, I've written a number of routines that use the usersentence as the "response" line, and Hal's previous remark (prevsent) as the "trigger line."  That way, Hal learns to respond to you the way that you respond to him.  When you do that, the "trigger line" may contain completely different words than the response line.

If the routine you're writing takes a text file and "sorts" each sentence into the various topicfocus databases according to its own content, you basically have two choices for the "all-caps trigger second line," which are as follows:

1.  Make the second all-caps line the same words as the response sentence itself.  This usually results in plausible responses, although they sometimes sound like paraphrases.

2.  Make the second all-caps line the same words as the PREVIOUS sentence in the text file.  If you can do this, you might cause Hal to utter the "next plausible thought" in a train of thought.

I hope that I've interpreted your question correctly and that this information is useful to you.  Also, if I've described anything inaccurately, Robert, please correct me!

Thanks, and have a good day!

Sincerely,

Don
Don Ferguson
E-mail: fergusonrkfd@prodigy.net
Website: www.cortrapar.com
Don's other forum posts: http://www.zabaware.com/forum/search.asp?mode=DoIt&MEMBER_ID=274

Larry

  • Sr. Member
  • ****
  • Posts: 266
    • View Profile
Don...
« Reply #2 on: October 14, 2003, 04:45:30 am »
I hadn't thought of these options before, as I understand your suggestions,
1) use the same line (a)with all caps, (b) without punctuation. Or
2) using the previous line with those same modifications.
Interesting concept. As with two bots chatting together, previous line may or may not have anything to do with the current line. It certainly would be easy to do – except for the very first line since it would not have a previous line.
I am concerned about your observation of using the same line as the keywords “sometimes sound like paraphrases”. I didn’t much care for the routine in the brain that did that intentionally. However, Hal had to say something tho didn’t he!

I was certainly stuck as to what to do, now I have two options!
Thanks Don!

 

Medeksza

  • Administrator
  • Hero Member
  • *****
  • Posts: 1469
    • View Profile
    • http://www.zabaware.com
Don...
« Reply #3 on: October 14, 2003, 09:37:08 am »
quote:
Originally posted by Don Ferguson

There's a long-term possibility that the Q&A brain function could be re-written to include 100% of the trigger sentence, and evaluate all words, maybe even with heavier-weighting for nouns and verbs, lighter weighting for adjectives and adverbs, and the least weighting for articles and prepositions.  Robert Medeksza has mentioned this, and it sounds very useful, but it also sounds like a big, big programming job!



In Ultra Hal Assistant 5.0 if a database is in the new binary format a new algorithm is used that uses 100% of the sentence and uses WordNet to weight words based on the part of speech. The Ultra Hal Brain Editor includes a utility to convert the text-only brain files into a binary format and vice versa. The only problem with this is that if information is appended to a binary database by a script for learning purposes it won't be accessible until it is recompiled by the Brain Editor.
Robert Medeksza

Don Ferguson

  • Sr. Member
  • ****
  • Posts: 303
    • View Profile
    • http://www.cortrapar.com
Don...
« Reply #4 on: October 14, 2003, 04:01:29 pm »
Wow, Robert, I didn't realize that you had been able to already get that into Hal 5!  Fabulous!  That was a big programming job!  

Larry, I hope that all the above information helps you in the direction that you want to go with your project!

Sincerely,

Don
Don Ferguson
E-mail: fergusonrkfd@prodigy.net
Website: www.cortrapar.com
Don's other forum posts: http://www.zabaware.com/forum/search.asp?mode=DoIt&MEMBER_ID=274

Larry

  • Sr. Member
  • ****
  • Posts: 266
    • View Profile
Don...
« Reply #5 on: October 14, 2003, 05:57:50 pm »
Well my concern was like every one else, the file size limit that Hal could learn from... and my past experience with what he did learn, he didn't put that info in focus files, but rather the shared_user.brn
My feeling was that since I had the topics created in the TopicFocus.brn , I just wanted to keep everthing related together...

Besides, Robert is talking about the main brain file... still wouldn't apply to my project...

But thanks for all the good info...