Deepseek Tianenmen square controversy gets weirder

manicdave · edit-2 17 hours ago

Deepseek Tianenmen square controversy gets weirder

SGforce@lemmy.ca · 20 hours ago

The local models are distilled versions of Qwen or llama or whatever else, not really deepseek’s model. So you get refusals based on the base model primarily, plus whatever it learned from the distilling. If it’s Qwen or another Chinese model then it’s more likely to refuse but a llama model or something else could pick it up to a lesser extent.

manicdave · 17 hours ago

You get the exact same cookie cutter response in the llama models, and the qwen models process the question and answer. The filter is deepseek’s contribution.

felixwhynot@lemmy.world · 16 hours ago

From what I understand, the Distilled models are using DeepSeek to retrain e.g. Llama. So it makes sense to me that they would exhibit the same biases.

Architeuthis@awful.systems · edit-2 13 hours ago

Distilling is supposed to be a shortcut to creating a quality training dataset by using the output of an established model as labels, i.e. desired answers.

The end result of the new model ending up with biases inherited from the reference model should hold, but using as a base model the same model you are distilling from would seem to be completely pointless.

manicdave · 16 hours ago

Some models are llama and some are qwen. Both sets respond with “I am sorry, I cannot answer that question. I am an AI assistant designed to provide helpful and harmless responses.” when you spell it Tianenmen, but give details when you spell it Tiananmen.

felixwhynot@lemmy.world · 16 hours ago

To your point, neither of those are truly Deepseek under the hood