I know people who have their passwords on a google doc or email passwords. I foresee a lot of accounts getting hacked once people can crack the right prompts for the LLM.
Bard is the name of the service, they can create account specific models trained on your user data which aren’t shared with other accounts (as an extension of the base model built on public data). I’ve already read about companies doing this to avoid cross contamination. Pretty sure Google is aware of this.
But I don’t know if Google cares enough about privacy to bother training individual models to avoid cross contamination. Each model takes years worth of super computer time, so the fewer they’d need to train, the less costly.
Hmm, I thought one of the problems with LLMs was they’re pretty baked in in the training process. Maybe that was only with respect to removing information?
Yeah, it’s hard to remove data already trained into a model. But you can retrain them to add capabilities to an existing model, so if you copy one based on public data multiple times and then retrain with different sets of private data then you can save a lot of work
Their terms and privacy policy I guess. Also the possibility of data leak. I don’t think even Google would train their LLM on knowingly private data, that would be utter insanity.
I know people who have their passwords on a google doc or email passwords. I foresee a lot of accounts getting hacked once people can crack the right prompts for the LLM.
Geez! Who does that?
Old people
They’ll probably isolate the models from each other, but yeah, if they want to train shared models from private data then that could happen.
But bard is the public one, right?
Bard is the name of the service, they can create account specific models trained on your user data which aren’t shared with other accounts (as an extension of the base model built on public data). I’ve already read about companies doing this to avoid cross contamination. Pretty sure Google is aware of this.
But I don’t know if Google cares enough about privacy to bother training individual models to avoid cross contamination. Each model takes years worth of super computer time, so the fewer they’d need to train, the less costly.
Extending existing models (retraining) doesn’t need years, it can be done in far less time.
Hmm, I thought one of the problems with LLMs was they’re pretty baked in in the training process. Maybe that was only with respect to removing information?
Yeah, it’s hard to remove data already trained into a model. But you can retrain them to add capabilities to an existing model, so if you copy one based on public data multiple times and then retrain with different sets of private data then you can save a lot of work
That won’t work because they’re not going to train Bard on your email contents or documents.
So what does bard do with the access then? Is it like bing chat that can choose to search things?
Most probably yes, it will add those information to the context. Once you delete the chat, those data are gone.
That’s much better than using it for general training. Does anything keep Google from using it for training in the future though?
Their terms and privacy policy I guess. Also the possibility of data leak. I don’t think even Google would train their LLM on knowingly private data, that would be utter insanity.