Hey Checker57!
To address your question as detailed as I can, you are mostly correct. When the code is run, if the language model has not been downloaded then it will proceed to, then execute the rest of the program. This Godel model does not update, it is held locally at "C:\Users\USER\.cache\huggingface\hub" - this will display the Godel model. If you wish to see what makes the model, continue in the GODEL folder to snapshots, go to the end of that folder chain and you'll see all of the model files. I shouldn't have to say it, but don't alter these in any way.
The model itself is a PyTorch model. These are built using millions/billions/trillions of parameters to specify word vectors for Python to interact with, though indeed, you can use Java to build, train, and deploy them as well, as the 2 languages are very compatible with each other.
So, the model does not update, at least, not yet. I am working on a basic training for the model to remember directly, without Hal, user information, this way all information is stored as part of the language model itself however the training required for the model is exceeding acceptable memory limits, so as is, it works that Hal is the long term memory and basic brain, when Hal doesn't have an answer or that answer is vague, unrelated, or not part of a function GetResponse, then the model will take any responses from all along with any related information the model contains, along with any knowledge from the internet, and any previous conversation data to then spruce up Hal's original response or generate a new one entirely. As I designed it, Hal will retain both the user query AND the model output, which mean in a way Hal learns from the language model the more you use it, making his Reponses more intelligent and thus even better responses from the model.
Of course this is all personalized to each user, as no data generated by the model is ever available online, as it's run locally.
As far as hardware lag, yeah, sorry, it's an absolute ton of data processing for the model to generate responses to be human/Hal/like. I can say I use an m.2 drive for my c drive, I have a 6 core i5, 24 gigs of ddr4 ram, and a 3060 with 12 gigs of video ram. My inference time with this code is roughly 5-8 seconds, depending on how much data is scraped from the web and fed to the model with any user query.
Thanks for trying it out, I hope it's working well for you. I will soon update a few pieces of the program to include a hard-coded swear filter, I know that's been an issue some folks are having, which I should have foreseen to be honest.
-Spitfire2600