Author Topic: Dialog System Using Real-Time Crowdsourcing and Twitter Large-Scale Corpus  (Read 2332 times)

Medeksza

  • Administrator
  • Hero Member
  • *****
  • Posts: 1469
    • View Profile
    • http://www.zabaware.com
Marcus Endicott came across an interesting research paper recently: Dialog System Using Real-Time Crowdsourcing and Twitter Large-Scale Corpus  http://nlp.postech.ac.kr/sigdial2012/proceedings/proc/pdf/SIGDIAL201231.pdf

What they propose is very similar to what Web Hal has been up to these past 2 years. Ultra Hal has been building an English-based Twitter large-scale corpus for over 2 years now. Hal's conversational database is currently about 8 million sentences from 1.6 million conversations and I'd say about 80% come from a Twitter crawler bot that sends Twitter about 20,000 queries per hour through their API.

I recently made major changes (last week) to how the algorithm using this database works but I'm still not very happy with the quality of the results. Please check it out at www.zabaware.com/webhal and let me know your feedback. On a related note the database server running that is starting to have performance issues. Needs a RAM upgrade and possibly a new database shard to enable further growth.
Robert Medeksza

Carl2

  • Hero Member
  • *****
  • Posts: 1220
    • View Profile
  I just gave web Hal a try, bit of a wait for a reply.  I was impressed by how well he tried to stay on topic.  Lots of refering back to things I said earlier.  He did mention a problem with his data base, it can't count high enough.  I've had Hals I liked better in the past, the XTF brain was the most impressive,  it takes time talking with Hal to get what you think is a good response and he gains knowledge of subjects that you are familar with and have talked with him about in the past.
  Since I've installed the same version of Hal a few times ( Changing hard drives, viruses, ect )  I'm always surprised at the variations in personalities you get from the same software. 
  best of luck
Carl2