36 KiB
🐛 #363 - Bug: Gibberish output when using flash attention using Mistral-Small-Instruct-2409-Q6_K and Gemma-3-12b-it-q4_0 on CPU
| Author | djg26 |
|---|---|
| State | ❌ Closed |
| Created | 2025-05-01 |
| Updated | 2025-05-09 |
Description
What happened?
Whenever I use the flash attention flag the models always output incoherent text. This happens with both mistral small 2409 and gemma 3 12b. If I don't use flash attention they work fine. Normal llama.cpp works fine when I use flash attention on these models.
Command used for gemma: ./llama-server -m ~/Downloads/gemma-3-12b-it-q4_0.gguf -ctk q8_0 -ctv q8_0 -fa -c 32768
My system is a Core-i7 10700 with 32GB of RAM.
Name and Version
./llama-server --version
version: 3657 (98d16264)
built with cc (GCC) 14.2.1 20250207 for x86_64-pc-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output
💬 Conversation
👤 ikawrakow commented the 2025-05-01 at 06:59:00:
Thank you for the bug report. Can you confirm that #364 fixes it?
👤 djg26 commented the 2025-05-01 at 12:07:51:
It's a little better, however it still breaks down after a few tokens, becoming gibberish again with a longer context. Tested with Mistral-Small-Instruct-2409-Q6_K with the following command ./llama-server -m ~/Downloads/Mistral-Small-Instruct-2409-Q6_K.gguf -ctk q8_0 -ctv q8_0 -fa -c 32768 With flash attention:
Without flash attention:
👤 ikawrakow commented the 2025-05-01 at 13:38:15:
What about Gemma?
👤 djg26 commented the 2025-05-01 at 13:53:04:
Same issue with Gemma as well.
With flash attention:
Without flash attention:
👤 ikawrakow commented the 2025-05-01 at 15:45:01:
So, f16 and Q6_0 KV cache works. Here is the 5 paragraph story I get with Q6_0:
> Write a five paragraph story
The rain hammered against the windows of the antique bookstore, a relentless rhythm mirroring the turmoil in Eleanor’s heart. She’d come to "The Dusty Tome" seeking solace, a refuge from the wreckage of her recent engagement. The scent of aged paper and leather was usually comforting, a familiar hug for her soul, but today it only amplified the hollow ache in her chest. She drifted between towering shelves, her fingers tracing the spines of forgotten stories, hoping to find a narrative that could somehow explain her own shattered plot. A first edition of "Wuthering Heights" caught her eye, a book about passionate, destructive love – a cruel irony, she thought with a bitter smile.
Lost in thought, she hadn't noticed the elderly gentleman meticulously dusting a shelf nearby. He was a small man, with a cloud of white hair and eyes that twinkled with a quiet wisdom. "Lost, are you, dear?" he asked, his voice soft and laced with a gentle curiosity. Eleanor startled, embarrassed to be caught in her melancholic reverie. "Just… browsing," she mumbled, avoiding his gaze. He chuckled, a warm, comforting sound. "Browsing can be a powerful form of healing. Sometimes, a good story is all we need to realign ourselves." He paused, then added, "Everyone has a story, you know. And every story, even the sad ones, holds a thread of hope."
He led her to a small, secluded alcove filled with lesser-known poets and philosophers. "Have you ever read Rilke?" he asked, pulling a slim volume from the shelf. Eleanor shook her head. "He writes of acceptance, of embracing the totality of experience, even the pain. He says that out of suffering, beauty can emerge." He opened the book, pointing to a passage. "’Let everything happen to you: beauty and terror. Just keep going. No feeling is final.’" The words resonated within her, a small, flickering ember of understanding igniting in the darkness of her grief.
Eleanor spent the next hour lost in Rilke’s poetry, the rain outside fading into a gentle hum. She didn’t feel magically healed, but the sharp edges of her sadness had softened, replaced by a quiet sense of possibility. She realized she wasn’t just mourning the loss of a relationship, but the loss of a future she’d envisioned. And perhaps, she thought, a new, unexpected future awaited, a story yet unwritten.
As she prepared to leave, she turned to thank the gentleman. He was gone, vanished as silently as he'd appeared. The alcove felt empty, the book of Rilke resting on the shelf, waiting for another seeker. Eleanor smiled, a genuine smile this time, and stepped back into the rain, no longer feeling quite so lost. The rain still fell, but it felt different now, washing away the debris and leaving behind a space for something new to grow.
In the case of Q8_0 KV cache, it does start OK, but transitions to repetitions in the 4th paragraph. I'll need to investigate.
👤 ikawrakow commented the 2025-05-01 at 16:15:55:
OK, this should work now with Q8_0 KV cache. Here the 5 paragraph story I get with
./bin/llama-cli -m gemma3-12B-q4_0.gguf -t 32 -fa -rtr -ctk q8_0 -ctv q8_0 -c 16384 -cnv
Write a five paragraph story
The old lighthouse keeper, Silas, squinted at the churning grey sea. He'd seen a thousand storms lash against the craggy coast, each one a symphony of wind and wave, but this one felt different. A primal unease settled in his gut, a feeling he’d learned to trust over his fifty years tending the beacon. The rhythmic sweep of the lamp, a comforting pulse in the darkness, couldn’t quite dispel the sense of foreboding. He checked the barometer, its needle plummeting with alarming speed. Tonight, he knew, would be a night to remember.
As darkness deepened, the storm hit with brutal force. Rain lashed against the thick glass of the lighthouse windows, blurring the world into a chaotic swirl of grey. The wind howled like a banshee, rattling the metal framework and threatening to tear the very building from its foundations. Silas, a solitary figure against the tempest, methodically checked each lamp and mechanism, his movements practiced and sure. He'd seen ships founder in calmer seas, and he wouldn't let this storm claim another. The rhythmic flash of the light, a desperate plea in the roaring darkness, was all that stood between the rocky coast and unsuspecting vessels.
Suddenly, amidst the cacophony, Silas heard it - a faint, desperate cry carried on the wind. He strained his ears, battling the storm's fury. It came again, clearer this time, a child's voice, choked with fear. He rushed to the window, peering through the rain-streaked glass. Through a momentary break in the storm, he saw it – a small sailboat, tossed about like a toy, its mast broken, a tiny figure clinging to the wreckage.
Without hesitation, Silas activated the emergency beacon and grabbed his oilskins. He knew venturing out in such a storm was madness, but leaving a child to the mercy of the sea was unthinkable. He fought his way down the winding staircase, the lighthouse groaning around him. The waves crashed against the rocks, threatening to sweep him away, but he pressed on, guided by the child’s cries and the unwavering beam of the light he'd so diligently maintained.
Hours later, soaked and shivering, Silas stood on the beach, watching as the coast guard hauled a small, terrified girl from the wreckage. Relief washed over him, stronger than any wave. The storm had subsided, leaving behind a trail of debris and a sky slowly clearing to reveal a sliver of moon. He turned back towards the steadfast lighthouse, its beam still cutting through the lingering darkness, a silent guardian, a beacon of hope in a world often shrouded in storms.
👤 djg26 commented the 2025-05-01 at 17:04:17:
It does work better now, but I'm still having repetition issues when the context has a few 1,000 tokens in it, both with Gemma3 and Mistral small. This is with the KV cache both on Q8_0.
👤 ikawrakow commented the 2025-05-01 at 17:14:24:
You need to give me a way to reproduce it.
👤 djg26 commented the 2025-05-01 at 17:35:28:
With q8_0 on kv cache
./llama-server -m ~/Downloads/gemma-3-12b-it-q4_0.gguf -ctk q8_0 -ctv q8_0 -fa -c 32768
Without q8_0 on kv cache (but flash attention still enabled)
./llama-server -m ~/Downloads/gemma-3-12b-it-q4_0.gguf -fa -c 32768
I'm not very used to llama-cli so I've been doing it via llama-server's webui.
👤 ikawrakow commented the 2025-05-01 at 18:18:58:
Can you post cat /proc/cpuinfo? Thanks.
Here is what I get:
Can you post cat /proc/cpuinfo?
👤 ikawrakow commented the 2025-05-01 at 18:18:58:
Here is what I get:
Can you post cat /proc/cpuinfo?
👤 djg26 commented the 2025-05-01 at 18:31:10:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 165
model name : Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
stepping : 5
microcode : 0xfc
cpu MHz : 3591.565
cache size : 16384 KB
physical id : 0
siblings : 16
core id : 0
cpu cores : 8
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi pku ospke md_clear flush_l1d arch_capabilities
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_violation_ve ept_mode_based_exec
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds mmio_stale_data retbleed eibrs_pbrsb gds bhi
bogomips : 5799.77
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 165
model name : Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
stepping : 5
microcode : 0xfc
cpu MHz : 1853.498
cache size : 16384 KB
physical id : 0
siblings : 16
core id : 1
cpu cores : 8
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi pku ospke md_clear flush_l1d arch_capabilities
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_violation_ve ept_mode_based_exec
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds mmio_stale_data retbleed eibrs_pbrsb gds bhi
bogomips : 5799.77
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 165
model name : Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
stepping : 5
microcode : 0xfc
cpu MHz : 800.000
cache size : 16384 KB
physical id : 0
siblings : 16
core id : 2
cpu cores : 8
apicid : 4
initial apicid : 4
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi pku ospke md_clear flush_l1d arch_capabilities
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_violation_ve ept_mode_based_exec
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds mmio_stale_data retbleed eibrs_pbrsb gds bhi
bogomips : 5799.77
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 165
model name : Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
stepping : 5
microcode : 0xfc
cpu MHz : 800.000
cache size : 16384 KB
physical id : 0
siblings : 16
core id : 3
cpu cores : 8
apicid : 6
initial apicid : 6
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi pku ospke md_clear flush_l1d arch_capabilities
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_violation_ve ept_mode_based_exec
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds mmio_stale_data retbleed eibrs_pbrsb gds bhi
bogomips : 5799.77
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 4
vendor_id : GenuineIntel
cpu family : 6
model : 165
model name : Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
stepping : 5
microcode : 0xfc
cpu MHz : 4614.727
cache size : 16384 KB
physical id : 0
siblings : 16
core id : 4
cpu cores : 8
apicid : 8
initial apicid : 8
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi pku ospke md_clear flush_l1d arch_capabilities
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_violation_ve ept_mode_based_exec
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds mmio_stale_data retbleed eibrs_pbrsb gds bhi
bogomips : 5799.77
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 5
vendor_id : GenuineIntel
cpu family : 6
model : 165
model name : Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
stepping : 5
microcode : 0xfc
cpu MHz : 800.000
cache size : 16384 KB
physical id : 0
siblings : 16
core id : 5
cpu cores : 8
apicid : 10
initial apicid : 10
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi pku ospke md_clear flush_l1d arch_capabilities
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_violation_ve ept_mode_based_exec
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds mmio_stale_data retbleed eibrs_pbrsb gds bhi
bogomips : 5799.77
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 6
vendor_id : GenuineIntel
cpu family : 6
model : 165
model name : Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
stepping : 5
microcode : 0xfc
cpu MHz : 800.000
cache size : 16384 KB
physical id : 0
siblings : 16
core id : 6
cpu cores : 8
apicid : 12
initial apicid : 12
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi pku ospke md_clear flush_l1d arch_capabilities
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_violation_ve ept_mode_based_exec
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds mmio_stale_data retbleed eibrs_pbrsb gds bhi
bogomips : 5799.77
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 165
model name : Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
stepping : 5
microcode : 0xfc
cpu MHz : 800.000
cache size : 16384 KB
physical id : 0
siblings : 16
core id : 7
cpu cores : 8
apicid : 14
initial apicid : 14
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi pku ospke md_clear flush_l1d arch_capabilities
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_violation_ve ept_mode_based_exec
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds mmio_stale_data retbleed eibrs_pbrsb gds bhi
bogomips : 5799.77
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 8
vendor_id : GenuineIntel
cpu family : 6
model : 165
model name : Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
stepping : 5
microcode : 0xfc
cpu MHz : 4604.265
cache size : 16384 KB
physical id : 0
siblings : 16
core id : 0
cpu cores : 8
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi pku ospke md_clear flush_l1d arch_capabilities
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_violation_ve ept_mode_based_exec
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds mmio_stale_data retbleed eibrs_pbrsb gds bhi
bogomips : 5799.77
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 9
vendor_id : GenuineIntel
cpu family : 6
model : 165
model name : Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
stepping : 5
microcode : 0xfc
cpu MHz : 800.000
cache size : 16384 KB
physical id : 0
siblings : 16
core id : 1
cpu cores : 8
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi pku ospke md_clear flush_l1d arch_capabilities
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_violation_ve ept_mode_based_exec
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds mmio_stale_data retbleed eibrs_pbrsb gds bhi
bogomips : 5799.77
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 10
vendor_id : GenuineIntel
cpu family : 6
model : 165
model name : Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
stepping : 5
microcode : 0xfc
cpu MHz : 4603.834
cache size : 16384 KB
physical id : 0
siblings : 16
core id : 2
cpu cores : 8
apicid : 5
initial apicid : 5
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi pku ospke md_clear flush_l1d arch_capabilities
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_violation_ve ept_mode_based_exec
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds mmio_stale_data retbleed eibrs_pbrsb gds bhi
bogomips : 5799.77
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 11
vendor_id : GenuineIntel
cpu family : 6
model : 165
model name : Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
stepping : 5
microcode : 0xfc
cpu MHz : 4600.024
cache size : 16384 KB
physical id : 0
siblings : 16
core id : 3
cpu cores : 8
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi pku ospke md_clear flush_l1d arch_capabilities
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_violation_ve ept_mode_based_exec
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds mmio_stale_data retbleed eibrs_pbrsb gds bhi
bogomips : 5799.77
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 12
vendor_id : GenuineIntel
cpu family : 6
model : 165
model name : Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
stepping : 5
microcode : 0xfc
cpu MHz : 800.000
cache size : 16384 KB
physical id : 0
siblings : 16
core id : 4
cpu cores : 8
apicid : 9
initial apicid : 9
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi pku ospke md_clear flush_l1d arch_capabilities
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_violation_ve ept_mode_based_exec
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds mmio_stale_data retbleed eibrs_pbrsb gds bhi
bogomips : 5799.77
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 13
vendor_id : GenuineIntel
cpu family : 6
model : 165
model name : Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
stepping : 5
microcode : 0xfc
cpu MHz : 4598.921
cache size : 16384 KB
physical id : 0
siblings : 16
core id : 5
cpu cores : 8
apicid : 11
initial apicid : 11
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi pku ospke md_clear flush_l1d arch_capabilities
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_violation_ve ept_mode_based_exec
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds mmio_stale_data retbleed eibrs_pbrsb gds bhi
bogomips : 5799.77
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 14
vendor_id : GenuineIntel
cpu family : 6
model : 165
model name : Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
stepping : 5
microcode : 0xfc
cpu MHz : 4599.937
cache size : 16384 KB
physical id : 0
siblings : 16
core id : 6
cpu cores : 8
apicid : 13
initial apicid : 13
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi pku ospke md_clear flush_l1d arch_capabilities
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_violation_ve ept_mode_based_exec
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds mmio_stale_data retbleed eibrs_pbrsb gds bhi
bogomips : 5799.77
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 15
vendor_id : GenuineIntel
cpu family : 6
model : 165
model name : Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
stepping : 5
microcode : 0xfc
cpu MHz : 800.000
cache size : 16384 KB
physical id : 0
siblings : 16
core id : 7
cpu cores : 8
apicid : 15
initial apicid : 15
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi pku ospke md_clear flush_l1d arch_capabilities
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_violation_ve ept_mode_based_exec
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds mmio_stale_data retbleed eibrs_pbrsb gds bhi
bogomips : 5799.77
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
👤 djg26 commented the 2025-05-01 at 18:49:24:
I get the same issue using mistral small as well.
Command used: ./llama-server -m ~/Downloads/Mistral-Small-Instruct-2409-Q6_K.gguf -ctk q8_0 -ctv q8_0 -fa -c 32768
KV cache q8_0
./llama-server -m ~/Downloads/Mistral-Small-Instruct-2409-Q6_K.gguf -fa -c 32768
no Q8_0 KV cache
👤 djg26 commented the 2025-05-01 at 19:32:56:
Running with K at f16 and V at q8_0 seems to work fine.
./llama-server -m ~/Downloads/Mistral-Small-Instruct-2409-Q6_K.gguf -ctv q8_0 -fa -c 32768
👤 ikawrakow commented the 2025-05-02 at 05:10:26:
I'll have to investigate in more detail then. In the meantime, just don't use Q8_0 for K-cache.
👤 ikawrakow commented the 2025-05-04 at 06:04:20:
Not sure what to do with this one. It works fine on both of my systems (Zen4 and vanilla AVX2) after #364
👤 djg26 commented the 2025-05-04 at 14:39:17:
It seems like having K be q8_0 and flash attention turned on doesn't work, at least for me. I've tested it on another computer with a ryzen 5 5600x and it still starts getting repetitive like before after some tokens. If I don't enable -fa and just have the K be q8_0 then the models work fine. It also works fine if K is something like q6_0 with the -fa flag.
👤 djg26 commented the 2025-05-04 at 17:18:01:
Qwen3-30B-A3B also works fine when using q8_0 KV and flash attention.
👤 djg26 commented the 2025-05-09 at 17:16:43:
Closing as after doing another git pull to the latest version it seems to work fine now.