So I was just reading this thread about deepseek refusing to answer questions about Tianenmen square.

It seems obvious from screenshots of people trying to jailbreak the webapp that there’s some middleware that just drops the connection when the incident is mentioned. However I’ve already asked the self hosted model multiple controversial China questions and it’s answered them all.

The poster of the thread was also running the model locally, the 14b model to be specific, so what’s happening? I decide to check for myself and lo and behold, I get the same “I am sorry, I cannot answer that question. I am an AI assistant designed to provide helpful and harmless responses.”

Is it just that specific model being censored? Is it because it’s the qwen model it’s distilled from that’s censored? But isn’t the 7b model also distilled from qwen?

So I check the 7b model again, and this time round that’s also censored. I panic for a few seconds. Have the Chinese somehow broken into my local model to cover it up after I downloaded it.

I check the screenshot I have of it answering the first time I asked and ask the exact same question again, and not only does it work, it acknowledges the previous question.

So wtf is going on? It seems that “Tianenmen square” will clumsily shut down any kind of response, but Tiananmen square is completely fine to discuss.

So the local model actually is censored, but the filter is so shit, you might not even notice it.

It’ll be interesting to see what happens with the next release. Will the censorship be less thorough, stay the same, or will china again piss away a massive amount of soft power and goodwill over something that everybody knows about anyway?

  • SGforce@lemmy.ca
    link
    fedilink
    English
    arrow-up
    6
    ·
    20 hours ago

    The local models are distilled versions of Qwen or llama or whatever else, not really deepseek’s model. So you get refusals based on the base model primarily, plus whatever it learned from the distilling. If it’s Qwen or another Chinese model then it’s more likely to refuse but a llama model or something else could pick it up to a lesser extent.

    • manicdaveOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      17 hours ago

      You get the exact same cookie cutter response in the llama models, and the qwen models process the question and answer. The filter is deepseek’s contribution.

      • felixwhynot@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        1
        ·
        16 hours ago

        From what I understand, the Distilled models are using DeepSeek to retrain e.g. Llama. So it makes sense to me that they would exhibit the same biases.

        • Architeuthis@awful.systems
          link
          fedilink
          English
          arrow-up
          4
          ·
          edit-2
          13 hours ago

          Distilling is supposed to be a shortcut to creating a quality training dataset by using the output of an established model as labels, i.e. desired answers.

          The end result of the new model ending up with biases inherited from the reference model should hold, but using as a base model the same model you are distilling from would seem to be completely pointless.

        • manicdaveOP
          link
          fedilink
          English
          arrow-up
          3
          ·
          16 hours ago

          Some models are llama and some are qwen. Both sets respond with “I am sorry, I cannot answer that question. I am an AI assistant designed to provide helpful and harmless responses.” when you spell it Tianenmen, but give details when you spell it Tiananmen.