The Hal Brain Editor already has a function called "Learn From Text File" that attempts to do something similar. Just open the Brain Editor, open your brain, click "Learn" from the menu bar and then click "Learn From Text File" It can work well in some instances, but because most text files aren't conversational in nature, they are not really good source material for Hal to learn from well. The best way to teach Hal remains conversation with Hal, entering question & answer pairs, or entire conversational threads in the new web Hal.
When people have actual conversation over Twitter this becomes one of the best sources for Hal to learn from as it is a conversation between 2 intelligent (usually) humans rather than a human and a computer, or a non-conversational piece of text. In addition exchanges can't be more than a few sentences as Twitter limits messages to 140 characters, so perfect conversational size.
Twitter conversations are public so Hal currently downloads about 200,000 tweets at random every day and pieces them together into conversational threads. They go through a filter to remove junk (bad language, grammar, too short, etc.) and ends up with 5,000-10,000 sentences that get added to his database every day. But this is only a small scratch on the surface of what really goes through Twitter everyday. Twitter carries about 50 million tweets per day. My agreement with Twitter lets me query Twitter at random at the rate of 20,000 per hour, so I hope to at least double the rate of Hal's learning soon by upgrading to a faster computer.
But what I would really love is to get access to this:
http://blogs.loc.gov/loc/2010/04/how-tweet-it-is-library-acquires-entire-twitter-archive/ which is the entire 167 terrabytes of the Twitter database from 2006. If Hal had that massive of a database we could do cool things like filter on region of the world, period of history, or attributes in twitter profiles. It could be possible to create a whole new Hal personality simply by giving a higher weight to certain attributes, for example "female twitter user ages 18-30 from the west coast USA during 2008" vs. "male ages 30+ from NYC 2008+" or any combination you can think of including filtering on anything people might write about themselves in their profile. Even without the entire twitter db, Hal's db will be getting larger and the larger it gets the more filtering can be done to create new and unique personalities.