dupa

Author Topic: Better sentence splitting  (Read 6908 times)

jasondude7116

  • Sr. Member
  • ****
  • Posts: 475
    • View Profile
Better sentence splitting
« on: December 08, 2010, 01:45:12 am »
for those interested:
this will give you better sentence splitting.
go to your current brain .uhp file. (mine is GRETTA44.uhp) The default is Hal6.uhp

find this:

Code: [Select]
    ''PROCESS: SPLIT USER'S INPUT STRING INTO SEPERATE SENTENCES
    'Encode abbreviations such as Mr. Mrs. and Ms.
    InputString = Replace(InputString, "MR.", "Mr<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "MRS.", "Mrs<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "MS.", "Ms<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "DR.", "Dr<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "MS.", "Ms<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "ST.", "St<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "PROF.", "Prof<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "GEN.", "Gen<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "REP.", "Rep<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "SEN.", "Sen<PERIOD>", 1, -1, vbTextCompare)
    ''Remove unnecessary punctuation



And replace it with this:

Code: [Select]
'*********************************Edited out section***********************************************
   
    ''PROCESS: SPLIT USER'S INPUT STRING INTO SEPERATE SENTENCES
    ''Encode abbreviations such as Mr. Mrs. and Ms.
    'InputString = Replace(InputString, "MR.", "Mr<PERIOD>", 1, -1, vbTextCompare)
    'InputString = Replace(InputString, "MRS.", "Mrs<PERIOD>", 1, -1, vbTextCompare)
    'InputString = Replace(InputString, "MS.", "Ms<PERIOD>", 1, -1, vbTextCompare)
    'InputString = Replace(InputString, "DR.", "Dr<PERIOD>", 1, -1, vbTextCompare)
    'InputString = Replace(InputString, "MS.", "Ms<PERIOD>", 1, -1, vbTextCompare)
    'InputString = Replace(InputString, "ST.", "St<PERIOD>", 1, -1, vbTextCompare)
    'InputString = Replace(InputString, "PROF.", "Prof<PERIOD>", 1, -1, vbTextCompare)
    'InputString = Replace(InputString, "GEN.", "Gen<PERIOD>", 1, -1, vbTextCompare)
    'InputString = Replace(InputString, "REP.", "Rep<PERIOD>", 1, -1, vbTextCompare)
    'InputString = Replace(InputString, "SEN.", "Sen<PERIOD>", 1, -1, vbTextCompare)
    ''Remove unnecessary punctuation
   
'*********************************New section***************************************************** 

    'PROCESS: SPLIT USER'S INPUT STRING INTO SEPERATE SENTENCES
    'Encode abbreviations such as Mr. Mrs. and Ms.
    InputString = Replace(InputString, " A. ", " A<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, " B. ", " B<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, " C. ", " C<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, " D. ", " D<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, " E. ", " E<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, " F. ", " F<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, " G. ", " G<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, " H. ", " H<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, " I. ", " I<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, " J. ", " J<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, " K. ", " K<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, " L. ", " L<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, " M. ", " M<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, " N. ", " N<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, " O. ", " O<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, " P. ", " P<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, " Q. ", " Q<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, " R. ", " R<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, " S. ", " S<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, " T. ", " T<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, " U. ", " U<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, " V. ", " V<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, " W. ", " W<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, " X. ", " X<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, " Y. ", " Y<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, " Z. ", " Z<PERIOD> ", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "SR.", "Mr<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "MR.", "Mr<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "JR.", "Jr<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "INC.", "Inc<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "...", "<PERIOD><PERIOD><PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "MRS.", "Mrs<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "MS.", "Ms<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "DR.", "Dr<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "MS.", "Ms<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "ST.", "St<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "PROF.", "Prof<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "GEN.", "Gen<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "REP.", "Rep<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "SEN.", "Sen<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "MT.", "Mt<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "JAN.", "Jan<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "FEB.", "Feb<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "MAR.", "Mar<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "APR.", "Apr<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "JUN.", "Jun<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "JUL.", "Jul<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "AUG.", "Aug<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "SEP.", "Sep<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "OCT.", "Oct<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "NOV.", "Nov<PERIOD>", 1, -1, vbTextCompare)
    InputString = Replace(InputString, "DEC.", "Dec<PERIOD>", 1, -1, vbTextCompare)
    'Remove unnecessary punctuation
   
'******************************************************************************************************


it will give you a clearly marked area with the original and new code. Original code is commented out and not in use.
**ANY TIME YOU EDIT THE MAIN BRAIN FILE PLEASE MAKE A BACKUP FIRST.
 

Lola

  • Jr. Member
  • **
  • Posts: 96
    • View Profile
Re: Better sentence splitting
« Reply #1 on: December 08, 2010, 03:12:48 am »
Thanks dude, I will try it.  :)
 

raybe

  • Hero Member
  • *****
  • Posts: 1067
    • View Profile
Re: Better sentence splitting
« Reply #2 on: December 08, 2010, 11:11:53 pm »
Jason if you don't mind could you explain the reason for better sentence splitting? Are you saying that Ultra Hal will find it easier to separate topics or to find the topic of a sentence?

Don't you love when people ask you a question and then still try to answer it themselves? Oh well.

Thanks,
raybe

Note: I see you Lola.(not literally of coarse. That would creep even me out.) Just Kidding.
 

snowman

  • Hero Member
  • *****
  • Posts: 956
  • Ai + Feelings + Supercompter = End of World
    • View Profile
    • http://www.MinervaAi.com
Re: Better sentence splitting
« Reply #3 on: December 09, 2010, 12:28:54 am »
Thanks JasonDude, I modified a little bit.... and added it to Athena  :)
 I hope there's no errors in it.

Code: [Select]

        Dim AbbCol(50)
        AbbCol(0) = " a. "
        AbbCol(1) = " b. "
        AbbCol(2) = " c. "
        AbbCol(3) = " d. "
        AbbCol(4) = " e. "
        AbbCol(5) = " f. "
        AbbCol(6) = " g. "
        AbbCol(7) = " h. "
        AbbCol(8) = " i. "
        AbbCol(9) = " j. "
        AbbCol(10) = " k. "
        AbbCol(11) = " l. "
        AbbCol(12) = " m. "
        AbbCol(13) = " n. "
        AbbCol(14) = " o. "
        AbbCol(15) = " p. "
        AbbCol(16) = " q. "
        AbbCol(17) = " r. "
        AbbCol(18) = " s. "
        AbbCol(19) = " t. "
        AbbCol(20) = " u. "
        AbbCol(21) = " v. "
        AbbCol(22) = " w. "
        AbbCol(23) = " x. "
        AbbCol(24) = " y. "
        AbbCol(25) = " z. "

        AbbCol(26) = "sr."
        AbbCol(27) = "mr."
        AbbCol(28) = "jr."
        AbbCol(29) = "inv."
        AbbCol(30) = "..."
        AbbCol(31) = "mrs."
        AbbCol(32) = " ms."
        AbbCol(33) = "dr."
        AbbCol(34) = "st."
        AbbCol(35) = "prof."
        AbbCol(36) = "gen."
        AbbCol(37) = "rep."
        AbbCol(38) = "sen."
        AbbCol(39) = " mt."
        AbbCol(40) = "jan."
        AbbCol(41) = "feb."
        AbbCol(42) = "mar."
        AbbCol(43) = "apr."
        AbbCol(44) = "jun."
        AbbCol(45) = "aug."
        AbbCol(46) = "sep."
        AbbCol(47) = "oct."
        AbbCol(48) = "nov."
        AbbCol(49) = "dec."


        'Encode abbreviations such as Mr. Mrs. and Ms.   
       For Each abb1 In AbbCol
            Dim abb2 = Replace(abb1, ".", "<period>", 1, -1, vbTextCompare)
            If abb > "" Then InputString = Replace(InputString, UCase(abb1), UCase(abb2), 1, -1, vbTextCompare)
        Next





Oh and Raybe, if you don't mind me answering.

Basically, if you say the following string to your Hal:

"How are you today Hal? How have you been? I've been at work all day long."

After you press enter then Hal's brain will split this input-string into three sentences and process each one separately. Essentially, Hal looks at the (?, ?, .) that ends each of the previous sentences and cuts the string up.

Jason has made some code that fixes potential errors in how Hal splits these sentences. Now look at the next three sentences:

"How are you today Sr. Hal. How is Dr. Hal. I saw Prof. Puffy today. "

If you fed the last string to Hal then Hal would think that there were six sentence here because there were allot of periods. But Jason's code cleans it up so that it will only read three sentences, like it should in the first place.

Hal does some of this already but Jason made it better.

Hope this clears some of it up. ;)
« Last Edit: December 09, 2010, 05:31:23 pm by snowman »
Live long and prosper or die trying.

snowman

  • Hero Member
  • *****
  • Posts: 956
  • Ai + Feelings + Supercompter = End of World
    • View Profile
    • http://www.MinervaAi.com
Re: Better sentence splitting
« Reply #4 on: December 09, 2010, 02:37:11 am »
Hey Jason!


Not sure what you will think about all this since I'm not too versed in Hal database functions.
If only to give you inspiration.

Oh and I can't guarenty the code will be without Errors.



MARKOV TAGS

You can use numbers as tag locators so you can itterate through the tags till you find the word that has the greatest probabilty.

i.e. make the format look someting like:

 1<run> 2<walk> 5<talk>

So to find the greatest value do something like:

Code: [Select]

allcelldata = 'this is all data from any one cell from the topics column

For i = 1 To 100

     Arg1 = FindArg(allcelldata, CStr(i) & "<"  , ">")
     If Arg1 = "" Then Exit For

Next

Arg1 = 'This should contain the word that is the most frequent.




If you're looking to update the tag value then all you do is read that cell, replace data, re-write cell.

Code: [Select]

allcelldata = 'this is all data from any one cell from the topics column

cellWordtoReplace = "talk"

For i = 1 To 100

     If Instr(allcelldata, CStr(i) & "<" & cellWordtoReplace & ">", vbTextCompare) > 0 Then
          allcelldata = Replace(allcelldata,  CStr(i) & "<" & cellWordtoReplace & ">",   CStr(i + 1)  & "<" & cellWordtoReplace & ">",  vbTextCompare)
          Exit For
     End If
     
Next

allcelldata = 'This contains the updated cell data. i.e. 6<talk>

'Now update the cell
'not sure how, never done this in Hal before
'maybe simply overwrite it.





Bellow is a function that you can use in you Markov plugin.
It will retreive arguments out of tags. I originally made it for Athena.


Code: [Select]

Function FindArg(strCmdStart, FindCmdStart, FindCmdEnd)

On Error Resume Next

            FindCmdEnd = 0
            strArg = ""
            findit = InStr(1, strCmdLine, strCmdStart, vbTextCompare)

            If findit > 0 Then
                FindCmdStart = findit + Len(strCmdStart)
                If strCmdEnd = "" Then
                    strArg = Mid(strCmdLine, FindCmdStart, Len(strCmdLine))
                Else
                    FindCmdEnd = InStr(FindCmdStart, strCmdLine, strCmdEnd, vbTextCompare)
                    strArg = Mid(strCmdLine, FindCmdStart, FindCmdEnd - FindCmdStart)
                End If
            End If

            FindArg = strArg
End Function
Live long and prosper or die trying.

jasondude7116

  • Sr. Member
  • ****
  • Posts: 475
    • View Profile
Re: Better sentence splitting
« Reply #5 on: December 09, 2010, 05:00:36 pm »
Snowman -

in the code that you "modified" for sentence splitting, i noticed that you changed the ones that have just letters i.e.: 

InputString = Replace(InputString, " A. ", " A<PERIOD> ", 1, -1, vbTextCompare)

to this:

AbbCol(0) = "a."

Note: mine has a space on either side of the letter. and on the other splits like "Rep." there is not because usually space+letter+period+space is used for things like a middle name. however things like "Rep." can be in a sentence like "I saw my state rep., but i didn't get to talk to him." so that the abbreviation might not have spaces on either side but rather some punctuation.
 

snowman

  • Hero Member
  • *****
  • Posts: 956
  • Ai + Feelings + Supercompter = End of World
    • View Profile
    • http://www.MinervaAi.com
Re: Better sentence splitting
« Reply #6 on: December 09, 2010, 05:24:40 pm »
Yeah, there's probably a few more bugs hiding around somewhere. I'll fix it then.

Oh and I've been meaning to ask. Where or how did you figure out the Markov chain in the first place?

**** Its now upated****
« Last Edit: December 09, 2010, 05:32:21 pm by snowman »
Live long and prosper or die trying.

raybe

  • Hero Member
  • *****
  • Posts: 1067
    • View Profile
Re: Better sentence splitting
« Reply #7 on: December 16, 2010, 07:34:08 pm »
Thank you, for explanation. I often wondered how Ultra Hal would except that line of communication. I believe with the older Ultra Hal it was a (;) that would tie subjects together but no where near to what jason has done or with your contribution snowman.

Great feature. Is it a code we can use now or should we wait until everything is checked out?

Had no cable service, so have been off line.

Thanks,
raybe
 

jasondude7116

  • Sr. Member
  • ****
  • Posts: 475
    • View Profile
Re: Better sentence splitting
« Reply #8 on: December 16, 2010, 10:43:16 pm »
been using the code from the first post for a couple of years without error.
 

raybe

  • Hero Member
  • *****
  • Posts: 1067
    • View Profile
Re: Better sentence splitting
« Reply #9 on: December 21, 2010, 09:26:29 pm »
Thanks to you both. Comcast finally realized they put on an external signal booster on my line but they forgot that you can't use a cable booster on data. So I just had them re-split and everything seems to be working now.

I am going to try the code you posted Jason and let you know. Snowman please let me know if you resolve some of your additions. I really feel that it would make a positive addition to Ultra Hal processing.

Thank you,
raybe
 

Art

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3854
    • View Profile
Re: Better sentence splitting
« Reply #10 on: December 23, 2010, 07:57:23 pm »
Comcast eh?

So that's why you never answered my email....
In the world of AI it's the thought that counts!

- Art -

raybe

  • Hero Member
  • *****
  • Posts: 1067
    • View Profile
Re: Better sentence splitting
« Reply #11 on: December 24, 2010, 01:52:15 pm »
Art I  tried looking back at my e-mail and still can't find anything. But as I can see you are very familiar with Comcast services.

Sorry I know this is not the thread but I just wanted to explain that Comcast cost me so much work because most of my customers e-mail bids and plans to me and they think I'm blowing them off. If it makes any sense please feel free to resend and hopefully things will continue to function.

Couldn't believe they had a booster on my cable line!!

I did input the code and everything seems to be working just fine. Thanks jason!
Any update from snowman?

Let me give it a rest and just wish everyone and family a blessed Holiday Season.

Thank you for everything!

raybe