Author Topic: what to do about hyphens , other characters ?  (Read 3266 times)

lanman

  • Newbie
  • *
  • Posts: 4
    • View Profile
what to do about hyphens , other characters ?
« on: December 28, 2003, 03:35:02 pm »
I'm a new user of Hal and I'm trying to explore its potential as an assistant. I'm having troubles that involve the fact that Hal seems to remove all punctuation characters via statements like

   
   UserSentence = Replace("" & UserSentence & "", ";", " VSZ ", 1, -1, vbTextCompare)
   UserSentence = Replace("" & UserSentence & "", ":", " VMZ ", 1, -1, vbTextCompare)
   UserSentence = Replace("" & UserSentence & "", ", ", " VCZ ", 1, -1, vbTextCompare)


and

   'PROCESS: REMOVE PUNCTUATION
   'This function removes all punctuation and
   'symbols from the User's sentence so they won't
   'confuse Hal during processing.
   UserSentence = HalBrain.AlphaNumericalOnly(UserSentence)

One of the problems I'm experiencing is when I'm asking Hal to search for information on the web via google where a hyphen character in the name of the topic I'm inquiring about is significant. I tried commenting out the line

UserSentence = Replace("" & UserSentence & "", "-", " VHZ ", 1, -1, vbTextCompare)

from the brain I was using, but this didn't help.

Can anyone tell me how to make Hal take a query topic more literally, not removing characters like hyphens from the topic of the search ?
« Last Edit: December 29, 2003, 11:22:27 am by lanman »
 

Don Ferguson

  • Sr. Member
  • ****
  • Posts: 303
    • View Profile
    • http://www.cortrapar.com
what to do about hyphens , other characters ?
« Reply #1 on: December 30, 2003, 03:43:32 am »
Hello,

The line that you removed is actually part of a routine that RESTORES some punctuation.

I'll try to make a long story short, but I need to lay out some background:

Some of Hal's best "thinking" occurs when he evaluates a current user sentence to patterns of past user sentences and responses.  As part of this, he evaluates how many words "match" and how many match in the correct sequence.

But we have to define "match" to the computer!  If we tell the computer that a word is a string of characters with spaces on either side, then " therefore " and " therefore; " are two different words!

Also, the ASCII codes are different for capital letters and lower case letters!  " Boston " and " boston " and " BOSTON " are three different words!

For these reasons, Zabaware decided to use ALL CAPS as the standard for those routines, and to strip out punctuation.  That resulted in the highest-quality "thinking" in those routines.

There are many different work-arounds possible to restore punctuation and capitalization when desired.  Here are a couple of them:

1.  Punctuation can be encoded as characters, then de-coded back to punctuation later.  That's what I did with the code that substitutes VSZ for a semicolon (the "s" is for semicolon) and VCZ for a comma (the "c" is for comma) and VMZ for colon (the "m" is arbitrary since "c" was taken, and the encode-character strings needed to be not-real-words).  Later in the script, these character strings get decoded and restored as punctuation.

In some places in the main script, extra spaces are added in front of and after punctuation characters, to make sure that Hal doesn't think that the punctuation characters are part of a word.  Some of the lines toward the end of the "corrections.brn" database remove the extra spaces if they exist, and also correct any "obviously wrong" punctuation combinations such as :; or !; or ;?

2.  You can use the variant "originalsentence" in a routine if you need the user remark exactly as the user made it.  The intention of the "originalsentence" variant is to preserve the user's remark verbatim if it's needed for anything.  Note that "originalsentence" does NOT have the pronouns reversed!

If you have written a custom routine, it sounds like your best bet would be to try using the "originalsentence".  Remember, you can choose to use the "usersentence" for conditional testing and other purposes, and then pull the "originalsentence" within your if-then routine for the one and only purpose that you need it for.

If you like the belt-and-suspenders approach, you could also create your own variant, named SentWithPunctuation, with a line like this:

SentWithPunctuation = UserSentence

...early in the main script before ANY processing to the User Sentence.  Then you would have your own variant and you would know that no subsequent modifications of the script would ever mess with your special variant.

You don't mention in your posting how you are getting your string of characters to output to Google.  If you are outputting a special variant and sending it outside Hal, and capturing it outside Hal somehow, then fine.  

However, if you are passing it out as "GetResponse" (the variant that Hal speaks), you will need to experiment and write whatever routines necessary to get your exact desired format delivered to "GetResponse."

For instance, if you wanted Hal to speak the "SentWithPunctuation" under specific circumstances, you could locate the following line of code near the very end of the main "GetResponse" function:

GetResponse = SentWithPunctuation

...of course, you would need to put if-then conditional coding around that line, so it operated only when you wanted it; otherwise it would operate all the time.

I hope this is helpful to you with your project.  Have a great day!

Sincerely,

Don
« Last Edit: December 30, 2003, 03:54:21 am by Don Ferguson »
Don Ferguson
E-mail: fergusonrkfd@prodigy.net
Website: www.cortrapar.com
Don's other forum posts: http://www.zabaware.com/forum/search.asp?mode=DoIt&MEMBER_ID=274

lanman

  • Newbie
  • *
  • Posts: 4
    • View Profile
what to do about hyphens , other characters ?
« Reply #2 on: January 02, 2004, 05:38:35 pm »
OK, Thanks Don. Actually, I didn't look closely as to whether the lines I was talking about were removing or replacing punctuation. I just pointed them out because they were indications that punctuation was being removed or manipulated in some way.

I had not yet tried customizing anything, but was trying to use the current built in support for using HAL to access search engines and found that it didn't appear easy or even possible (without customization) to get HAL to actually invoke a google search with the exact string you wanted searched for.

Anyway, thanks for your response. It does provide me with a direction ;-)

 

Don Ferguson

  • Sr. Member
  • ****
  • Posts: 303
    • View Profile
    • http://www.cortrapar.com
what to do about hyphens , other characters ?
« Reply #3 on: January 03, 2004, 12:14:07 am »
Hello Ianman,

Almost all my Hal work is with the main script and databases.  It is fairly rare that I connect Hal to the internet.  

Your work on Hal's web-search behavior sounds very interesting.  I think that your experiences would be useful to other members here on the forum.

Would you consider writing back after a while and let us all know what you tried, and whether you were able to get the Hal behavior that you wanted?

Have a great day!

Sincerely,

Don
Don Ferguson
E-mail: fergusonrkfd@prodigy.net
Website: www.cortrapar.com
Don's other forum posts: http://www.zabaware.com/forum/search.asp?mode=DoIt&MEMBER_ID=274

lanman

  • Newbie
  • *
  • Posts: 4
    • View Profile
what to do about hyphens , other characters ?
« Reply #4 on: January 03, 2004, 01:43:24 pm »
Okay Don. I'd be happy to. So far, I haven't done much worth talking about. Just some experimentation with the default capabilities. But, being a software engineer, and always looking for ways to automate things, I'll get around to making improvements to the default operation eventually ;-).

 

lanman

  • Newbie
  • *
  • Posts: 4
    • View Profile
what to do about hyphens , other characters ?
« Reply #5 on: January 03, 2004, 06:04:17 pm »
Okay, I did enough to answer my own original question. By editting the mywebhal.uhp script I was able to get a web-search to focus on my literal string rather than being subject to the HalBrain.ExtractKeywords(UserSentence) processing that removed my hyphen and kind of messes around with the search string.

First off, you do need to set up a webhal account with Zabaware which is how you specify your search engine choice like "google". Originally, I set it up and requested that Hal

> find t-vec

where "t-vec" is the name of the company that I work for. Before modification, the GetResponse() function in mywebhal.uhp did the following

           Keywords = Replace(Trim(HalBrain.AlphaNumericalOnly(HalBrain.ExtractKeywords(UserSentence))), " ", "+")
           GetResponse = "I will help you research this topic on the Internet.<RUN>http://www.zabaware.com/mywebhal/index.html?username=" & UserName & "&action=search&keywords=" & Keywords & "&char=" & FacePrvw & "</RUN>"

You can see that the "-" character probably gets removed by the HalBrain.AlphaNumericalOnly() function. But since "t-vec" is a proper name the search string "tvec" is no good at all. So I modified the script as follows


           quote_position = InStr(1, UserSentence, Chr(34), 1)
           If quote_position > 0 Then
             Keywords = Right(UserSentence, len(UserSentence) + quote_position)
           Else
             Set HalBrain = CreateObject("UltraHalAsst.Brain")
             Keywords = Replace(Trim(HalBrain.AlphaNumericalOnly(HalBrain.ExtractKeywords(UserSentence))), " ", "+")
           End If
           GetResponse = "I will help you research this topic on the Internet.<RUN>http://www.zabaware.com/mywebhal/index.html?username=" & UserName & "&action=search&keywords=" & Keywords & "&char=" & FacePrvw & "</RUN>"

I set it up so that if I issued the hal command

> find "t-vec"

the the exact string "t-vec" gets used as the bases of Keywords which gets submitted to google. If I don't use quotes around the search target, the original processing gets used.

I also made a few other changes to the script to make it more efficient, such as using ElseIF statements rather then If statements where the cases are mutually exclusive. But what I need to figure out now is how to pass quote characters as part of the google search string (Keywords). Right now, they are apparently being stripped out by the

<RUN>http://www.zabaware.com/mywebhal/index.html?username=" & UserName & "&action=search&keywords=" & Keywords & "&char=" & FacePrvw & "</RUN>

command. Even if I explicitly add them right before this command by doing something like

Keywords = Chr(34) & Keywords & Chr(34)

Does anyone know how I might be able to accomplish that. Quote characters tell google to match the exact string between the quote characters, rather than treat them as independent search words.

I also tried using the URL encoding approach "%23" (an escaped hexidecimal code for the " character) but it appears that the processing down by the Hal command <RUN>url spec</RUN> is filtering out the " characters.

Web-search functionality could be greatly enhanced by providing a way for un-adulterated search strings to be passed directly to a search engine without.
« Last Edit: January 03, 2004, 08:17:36 pm by lanman »