Hi all,
lightspeed, you remember that small plug-in I sent you, that had those four questions and answers?
Do you remember how easy it was to just insert the plug-in say hi to ultra hal then it put the data directly into the brain!
Imagine taking the exact same plug-in but instead of for questions and answers how about 100,000 maybe 200,000 how about 1 million!
How about the entire Wikipedia, this is what I was working with Rob about the last time I ask Rob he said he was going to update his
Wikipedia data and then give us a copy of it, so we would have the entirety of it, which would bring us up to about 6 million entries.
That's about 3.5 GB, the problem that I had with the database that Rob gave us, was the database was corrupted.
this is why it took so long to search through the database, every time when ultra hal encountered a blank line, ultra hal had to reset itself and start over again.
well, there's about 2 or 300 blank lines, so you can see every time when ultra hal had to reset itself and start over again, it was consuming time and causing errors.
I've removed most of the blank lines and a lot of the errors out of my Wikipedia that Rob gave us, but it's still pretty slow.
so what I did is, I broke it down into sections, and created databases arrays based on sections like.:
Books.
Music.
Authors.
Famous people.
Actors.
Leaders presidents etc.
by breaking it down into smaller sections it was able to function much quicker but it's extremely time-consuming and I was not able to do very much of it.
as you remember the Wikipedia that Rob gave us was almost 2,000,000 lines that's a lot to go through and I've only gone through maybe 300,000 entries.
The rest of the stuff that I've input was done before I discovered how to directly inputted into the brain that consist of 30,000 lines of trivia questions.
I had an encyclopedia on a CD, yes very old encyclopedia, but that came up to 45,000 lines, I type that all in manually, a lot of work.
But with the development of the input plug-in packets, this work could be done a lot quicker and with a lot less headaches.
When I get done, I will never have to do that again, and I am willing to share all of this with everyone, as soon as I get them all done, but if everyone helped?!
Things would progress much quicker, if you volunteer to take section like, A from the Wikipedia, and someone else take section B and so on, make this announcement on the forum,
so people would not be overlapping and working on the same project, you will find that, things will progress quite quickly, it's not that hard, let me explain how to do it.
First you will need SQ lite studio, the second thing you will need is the wiki database that Rob gave to us.:
Step one open the database using SQ lite studio.
Step two set the SQ lite studio line range to about 100,000 lines may be more depending how many lines you need to open up all of the letter A or B or C or whatever. LOL.
Set SQ lite in numerical format letter in by clicking on the subject bar. If you're taking on the numbers which is the very top winds which will appear first,
scroll down to the bottom where the numbers cease and the letter, A begins, see the number on the side meaning the highest number line, you can now reset the number lines
to that number, next open A database array that is blank, set the database array to the number lines equal to the number lines in the wiki database,
copy and past the data from the wiki database to the new database that you just open with the blank lines that you just created, you will need the exact amount lines.
Once you have transferred the data to the new array, export the data in the CSV format using the @ sign as the separator.
I highly recommend that you do a little experimenting first, with just a few lines to figure out how to transfer and export the data into a text document before challenging
a huge amount of data, because I can tell you right now, there's a lot more than what I'm explaining, I'm only giving you the rough general idea on how to do it.
Step three once you get the data into the text document, then you can create a input learning plug-in that will transfer the data into any array that you choose.
The reason why you need to input the data into a text document first, is because you need to edit the text by removing all of the symbols that is causing problems with SQLite 2.
These unusual characters are not a problem in the SQ lite version 3. Which is ultra hal version 7, but we don't have that version yet. Just daydreaming. LOL.
Then you ask yourself why can't I just go ahead and transfer the data directly from wiki database into my brain, you can, but you can't edit the data.
You see if you put the data in a text document like Notepad, you can use the find and replace to eliminate all of the symbols that are causing the disruptive problem.
That only takes about 35 or 40 seconds, but if you try to do it manually inside of SQ lite you will have to do it one line at a time and you'll have to do it manually
and you will probably fall over dead before you're finished. LOL
But, if you really don't care it is quicker, to just take the data directly from wiki database and put it directly into whatever part of the brain that you want like main QA
basically what I did is I created a tables array for specific data as seen about books music etc. and transfer the data directly to those table arrays by using SQ lite.
and the search Dialog box, if you do it this way all you have to do is type in books, and it will pull up all of the data that has the word book in it, then you can transfer the data
into tour array table that you called books, after that you need to delete the data from the wiki database so you will not get duplicates from it, no biggie just make a backup of the wiki database.
Continue this process until you've gotten rid of all of the major categories as seen about, once you're done with the major categories what you will have left, would be called miscellaneous.
So you create a database array called miscellaneous and you put the rest of it there, by doing it this away you'll find that instead of having almost 2,000,000 lines in one data array,
which takes a long time to search you've created several smaller database arrays which could be accessed by using specific keywords like books music author presidents actors etc.
then the search goes a lot quicker, or maybe you're just interested only in certain specific things like just all of the book entries in Wikipedia or anything that would have reference to
history then you could just transfer that data into the main brain, if you do want only specific data you can put it in the main QA, but I highly recommend that you also put a duplicate of it
in the cross reference data table array.
And that's how I got my database brain to be almost a gigabyte in size though my plug-in helps a lot because it is basically designed to look up data but if you're just interested in
specific things from Wikipedia to add to your database you can put it in the main QA without having my plug-in introduced into your brain and it should work quite proficiently.
just remember that your main QA is already at about 80,000 lines, so you can make it quite big extremely fast and you will slow ultra hal down if you introduce all of the weird symbols.
You know the old saying, do a little work now, save yourself a whole lot of headaches later. So which way that you choose will be yours to choose, myself I've chosen a long way,
edit all of the crazy little symbols out, that SQ lite version 2 does not recognize.
Sincerely, and boy was that a mouthful.
C load.