5.1 KiB
🐛 #340 - Bug: "unknown model architecture: 'cohere2'" when trying to load Command A model
| Author | Alexey-Akishin |
|---|---|
| State | ❌ Closed |
| Created | 2025-04-22 |
| Updated | 2025-04-26 |
Description
What happened?
It would be great if it was possible to run Command A in ik_llama.cpp (it works in llama.cpp). Currently when I try to load it, I get this:
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'cohere2'
llama_load_model_from_file: failed to load model
Name and Version
I tried with newest version from this repo
What operating system are you seeing the problem on?
Linux
Relevant log output
💬 Conversation
👤 ikawrakow commented the 2025-04-22 at 17:19:53:
I can look into adding it, but I don't have the bandwidth to test every model. Are you willing to test?
👤 saood06 commented the 2025-04-22 at 17:29:22:
I could test, there is a small model for it as well. I looked into the code, port looked simple (but would need to be redone because of their refactorings).
👤 Alexey-Akishin commented the 2025-04-22 at 17:34:13:
I will be more than happy to test, I build ik_llama.cpp from source, so for example I can test a patch when it is available, no problem.
👤 mcm007 commented the 2025-04-25 at 07:23:19:
Tested on CPU only, the small 7B model works OK with #341 .
👤 Alexey-Akishin commented the 2025-04-25 at 09:25:05:
Unfortunately it did not work for me with Command A. I just asked it to summarize first few paragraphs from wiki article about "dog":
# Article Summary?
## Introduction?
-? Dogs? (Canis familiaris or Canis lupus familiaris) are? domesticated? descendants? of?? gray? wolves.
-? First?? species? domesticated? by?? humans? over?? 14,000?? years?? ago.
?
## Domestication? and? Diet?
-? Selectively??? bred? from? an?? extinct? population? of? wolves? during? the?? Late? Pleistocene.
-? Adapted??? to? thrive? on?????????????? starch-rich??????????????????????? diet? due? to? long??? association? with?????????? humans.
?
## Physical? Attributes? and? Senses?
-? Varied? breeds? with?? different?? shapes,??? sizes,? and?????????????????? colors.
-? Possess??? powerful? jaws? with????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
Question marks did not want to stop so I interrupted generation.
For comparison, llama.cpp result:
The article provides an overview of the domestication, evolution, and roles of dogs in human society. Here's a concise summary:
Dogs, descended from gray wolves, were domesticated over 14,000 years ago, making them the first species tamed by humans. They have adapted to thrive on a starch-rich diet and possess enhanced senses of smell and hearing. Bred for various traits, dogs serve multiple purposes, including companionship, therapy, and assistance. The strong human-canine bond has led to extensive research, solidifying dogs' status as "man's best friend." Globally, the dog population is estimated at 700 million to 1 billion, with most living in developing countries as feral or community animals.
Model I used for testing: https://huggingface.co/bartowski/CohereForAI_c4ai-command-a-03-2025-GGUF/tree/main/CohereForAI_c4ai-command-a-03-2025-IQ4_NL
👤 ikawrakow commented the 2025-04-25 at 09:52:50:
It looks like something not quite right with the vocabulary. So, I guess, I need to test with this specific model.
👤 ikawrakow commented the 2025-04-25 at 10:29:47:
@Alexey-Akishin
Can you also provide the specific command line you are using? And the details of the system you are running on (GPU(s), CPU).
Thanks.
👤 ikawrakow commented the 2025-04-25 at 11:16:54:
So, downloaded this specific model. Works fine on the CPU. Produces gibberish on the GPU with partial offload.Is this model another one of those where one needs fp32 precision for it to work?
👤 ikawrakow commented the 2025-04-25 at 13:00:29:
Is this model another one of those where one needs fp32 precision for it to work?
Yes, it is. Setting the precision of the K*Q matrix multiplication to fp32 fixes the gibberish on CUDA. The current state of #341 should also work with the 111B parameter Command-A model.
👤 Alexey-Akishin commented the 2025-04-25 at 21:38:17:
I just tried latest #341 patch and it works well now! You are right, I was using CUDA (loading the whole model to GPUs). Thank you so much for adding support for Command A!
👤 ikawrakow commented the 2025-04-26 at 06:12:51:
OK, thanks for testing. I'll merge the PR and close the issue.