Refresh: ok @ 2026-06-15T12:31:10Z · Bisect: idle

transformers · integration-test failure triage

Generated 2026-06-13T12:30:56Z · window 2026-06-07 → 2026-06-13 (7 daily runs, ≥5/7 intersection)

TL;DR

733 persistent integration-test failures (≥5/7 days)
0 attributed to 0 distinct bad commits (CI bisect)
2 tagged flaky by CI
731 unpinned — CI bisect didn't converge (usually because the regression predates CI's 7-day bisect window)
Historical sweep finds 733 failures (100% of the total) first appeared on unknown — likely a single fleet-regression PR.

Regression-day clustering (historical first-failure)

For every persistent failure we walked the daily CI dataset backwards to find the first day it appeared as failing. The table below groups failures by that day — large buckets are likely fleet regressions from a single landed PR. Click a date to see the commits merged in the 24h window before it.

first-failure day	failures	share
unknown	733	100.0%

Top regression days — failure breakdown

unknown — 733 failures

Failure-mode mix: output_mismatch 487 other 153 OOM 57 import_or_config 22 load_error 12 cuda_runtime 2 · 161 distinct models touched. commit log around unknown

model	failures	sample mode	sample trace excerpt
whisper	42	other	`(line 370) RuntimeError: Input type (float) and bias type (c10::Half) should be the same`
musicgen_melody	16	output_mismatch	`(line 1307) AssertionError: Tensor-likes are not close!`
generation	15	output_mismatch	`(line 3215) AssertionError: Lists differ: ['Tel[23 chars]key. Sure, here\'s one for you:\n\nWhy did the[67 chars]s"!'] != ['Tel[23 chars]key. Why did the monkey go to the doctor?…`
gemma	14	output_mismatch	`(line 337) AssertionError: Lists differ: ['Hel[196 chars]tdi 105bhp.\nI have a problem with the engine [37 chars]the'] != ['Hel[196 chars]tdi 110bhp.\nI have a problem with the e…`
dac	12	output_mismatch	`(line 819) AssertionError: Tensor-likes are not close!`
edgetam	12	import_or_config	`(line 249) TypeError: FeatureListNet.forward() got an unexpected keyword argument 'original_sizes'`
glm46v	12	other	`(line 2567) RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got CUDABFloat16Type instead (while checking argumen…`
glm_ocr	12	other	`(line 456) assert [151331, 1513..., 151343, ...] == [59248, 59250...6, 59280, ...]`
gemma3	10	output_mismatch	`(line 874) AssertionError: 'DynamicSlidingWindowLayer' unexpectedly found in 'DynamicCache(layers=[DynamicSlidingWindowLayer, DynamicSlidingWindowLayer, DynamicSlidingWindowLayer…`
gemma3n	10	output_mismatch	`(line 1196) AssertionError: Lists differ: [' and I find it very relaxing. I also lik[112 chars]re'"] != [" and the people are so friendly. I'm so [93 chars]re'"]`
vision_encoder_decoder	10	output_mismatch	`(line 1352) AssertionError: Tensor-likes are not close!`
cohere2_vision	8	output_mismatch	`(line 687) AssertionError: False is not true : Actual logits: tensor([2.3711, 1.6689, 1.8389, 1.9785, 1.9121], dtype=torch.float16)`
… and 149 more models

Show all 733 failures in this bucket

model	gpu	test	failure_mode	days_seen	trace excerpt
bamba	multi	`test_simple_batched_generate_with_padding`	OOM	7/7	`(line 779) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacity of 22.30 GiB of which 3.37 GiB is free. Process 87938 has 18.93 GiB memory in use. Of the allocated memory 1…`
bamba	multi	`test_simple_generate`	OOM	7/7	`(line 779) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.00 GiB. GPU 0 has a total capacity of 22.30 GiB of which 3.28 GiB is free. Process 87938 has 19.01 GiB memory in use. Of the allocated memory 1…`
bamba	single	`test_simple_batched_generate_with_padding`	OOM	7/7	`(line 779) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacity of 22.30 GiB of which 3.50 GiB is free. Process 98882 has 18.80 GiB memory in use. Of the allocated memory 1…`
bamba	single	`test_simple_generate`	OOM	7/7	`(line 779) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.00 GiB. GPU 0 has a total capacity of 22.30 GiB of which 3.42 GiB is free. Process 98882 has 18.88 GiB memory in use. Of the allocated memory 1…`
cohere2_vision	multi	`test_model_integration_generate_chat_template`	OOM	6/7	`(line 353) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.86 GiB. GPU 1 has a total capacity of 22.30 GiB of which 1.48 GiB is free. Process 270159 has 20.82 GiB memory in use. Of the allocated memory …`
cohere2_vision	single	`test_model_integration_generate_chat_template`	OOM	6/7	`(line 353) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.86 GiB. GPU 0 has a total capacity of 22.30 GiB of which 720.69 MiB is free. Process 70312 has 21.59 GiB memory in use. Of the allocated memory…`
cwm	multi	`test_cwm_sliding_window_long_sequence`	OOM	7/7	`(line 255) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 102.00 MiB. GPU 1 has a total capacity of 22.30 GiB of which 46.69 MiB is free. Process 795802 has 22.25 GiB memory in use. Of the allocated memo…`
deepseek_v2	multi	`test_batch_fa2`	OOM	7/7	`(line 991) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 704.00 KiB is free. Process 381415 has 22.29 GiB memory in use. Of the allocated memo…`
deepseek_v2	multi	`test_deepseek_v2_lite`	OOM	7/7	`(line 5095) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 21.10 GiB. GPU 0 has a total capacity of 22.30 GiB of which 140.69 MiB is free. Process 381415 has 22.16 GiB memory in use. Of the allocated mem…`
deepseek_v2	multi	`test_logits_eager`	OOM	7/7	`(line 5095) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 21.10 GiB. GPU 0 has a total capacity of 22.30 GiB of which 140.69 MiB is free. Process 381415 has 22.16 GiB memory in use. Of the allocated mem…`
deepseek_v2	single	`test_batch_fa2`	OOM	7/7	`(line 991) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 2.69 MiB is free. Process 183760 has 22.29 GiB memory in use. Of the allocated memory…`
deepseek_v2	single	`test_deepseek_v2_lite`	OOM	7/7	`(line 5095) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 21.10 GiB. GPU 0 has a total capacity of 22.30 GiB of which 222.69 MiB is free. Process 183760 has 22.08 GiB memory in use. Of the allocated mem…`
deepseek_v2	single	`test_logits_eager`	OOM	7/7	`(line 5095) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 21.10 GiB. GPU 0 has a total capacity of 22.30 GiB of which 222.69 MiB is free. Process 183760 has 22.08 GiB memory in use. Of the allocated mem…`
deepseek_vl_hybrid	multi	`test_model_text_generation_batched`	OOM	7/7	`(line 1370) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 38.69 MiB is free. Process 587134 has 22.26 GiB memory in use. Of the allocated memo…`
deepseek_vl_hybrid	multi	`test_model_text_generation_with_multi_image`	OOM	7/7	`(line 1370) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 18.69 MiB is free. Process 587134 has 22.28 GiB memory in use. Of the allocated memo…`
emu3	multi	`test_model_generation`	OOM	7/7	`(line 92) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.10 GiB. GPU 0 has a total capacity of 22.30 GiB of which 694.69 MiB is free. Process 322396 has 21.62 GiB memory in use. Of the allocated memory…`
emu3	multi	`test_model_generation_batched`	OOM	7/7	`(line 5095) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 9.38 GiB. GPU 0 has a total capacity of 22.30 GiB of which 1.00 GiB is free. Process 322396 has 21.29 GiB memory in use. Of the allocated memory…`
emu3	multi	`test_model_generation_multi_image`	OOM	7/7	`(line 5095) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 9.71 GiB. GPU 0 has a total capacity of 22.30 GiB of which 1.00 GiB is free. Process 322396 has 21.29 GiB memory in use. Of the allocated memory…`
emu3	single	`test_model_generation_batched`	OOM	7/7	`(line 2397) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 458.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 378.69 MiB is free. Process 361351 has 21.93 GiB memory in use. Of the allocated me…`
emu3	single	`test_model_generation_multi_image`	OOM	7/7	`(line 5095) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 9.66 GiB. GPU 0 has a total capacity of 22.30 GiB of which 376.69 MiB is free. Process 361351 has 21.93 GiB memory in use. Of the allocated memo…`
exaone4	multi	`test_model_generation_beyond_sliding_window`	OOM	7/7	`(line 291) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 220.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 18.69 MiB is free. Process 1249037 has 22.28 GiB memory in use. Of the allocated mem…`
exaone4	multi	`test_model_generation_eager`	OOM	7/7	`(line 353) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1000.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 992.69 MiB is free. Process 1249037 has 21.33 GiB memory in use. Of the allocated m…`
exaone4	multi	`test_model_generation_sdpa`	OOM	7/7	`(line 353) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1000.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 992.69 MiB is free. Process 1249037 has 21.33 GiB memory in use. Of the allocated m…`
exaone4	multi	`test_model_logits`	OOM	7/7	`(line 353) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1000.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 994.69 MiB is free. Process 1249037 has 21.32 GiB memory in use. Of the allocated m…`
exaone4_5	multi	`test_model_generation_image_text`	OOM	7/7	`(line 353) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.46 GiB. GPU 0 has a total capacity of 22.30 GiB of which 1.46 GiB is free. Process 343938 has 20.83 GiB memory in use. Of the allocated memory …`
exaone4_5	multi	`test_model_generation_text_only`	OOM	7/7	`(line 353) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.46 GiB. GPU 0 has a total capacity of 22.30 GiB of which 1.46 GiB is free. Process 343938 has 20.84 GiB memory in use. Of the allocated memory …`
gemma4	multi	`test_export_text_only`	OOM	7/7	`(line 2301) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.38 GiB. GPU 0 has a total capacity of 22.30 GiB of which 2.67 GiB is free. Process 455140 has 19.62 GiB memory in use. Of the allocated memory…`
gemma4	single	`test_export_text_only`	OOM	7/7	`(line 2301) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.38 GiB. GPU 0 has a total capacity of 22.30 GiB of which 2.80 GiB is free. Process 433771 has 19.49 GiB memory in use. Of the allocated memory…`
glm	multi	`test_model_9b_fp16`	OOM	7/7	`(line 1370) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 22.69 MiB is free. Process 1362136 has 22.27 GiB memory in use. Of the allocated mem…`
glm	multi	`test_model_9b_sdpa`	OOM	7/7	`(line 1370) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.16 GiB. GPU 0 has a total capacity of 22.30 GiB of which 22.69 MiB is free. Process 1362136 has 22.27 GiB memory in use. Of the allocated memo…`
glm	single	`test_model_9b_fp16`	OOM	7/7	`(line 1370) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 214.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 74.69 MiB is free. Process 1317689 has 22.22 GiB memory in use. Of the allocated me…`
glm	single	`test_model_9b_sdpa`	OOM	7/7	`(line 1370) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.16 GiB. GPU 0 has a total capacity of 22.30 GiB of which 74.69 MiB is free. Process 1317689 has 22.22 GiB memory in use. Of the allocated memo…`
glm4_moe	multi	`test_compile_static_cache`	OOM	7/7	`(line 991) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 120.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 118.69 MiB is free. Process 535825 has 22.18 GiB memory in use. Of the allocated mem…`
glm4_moe	single	`test_compile_static_cache`	OOM	7/7	`(line 991) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 120.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 74.69 MiB is free. Process 1341191 has 22.22 GiB memory in use. Of the allocated mem…`
glm4_moe_lite	multi	`test_compile_static_cache`	OOM	7/7	`(line 991) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 8.69 MiB is free. Process 574492 has 22.29 GiB memory in use. Of the allocated memory…`
glm4_moe_lite	single	`test_compile_static_cache`	OOM	7/7	`(line 991) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 12.69 MiB is free. Process 348066 has 22.28 GiB memory in use. Of the allocated memor…`
llava	multi	`test_pixtral`	OOM	7/7	`(line 991) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 40.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 6.69 MiB is free. Process 1522550 has 22.29 GiB memory in use. Of the allocated memor…`
llava	multi	`test_pixtral_4bit`	OOM	7/7	`(line 5095) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 7.77 GiB. GPU 0 has a total capacity of 22.30 GiB of which 6.69 MiB is free. Process 1522550 has 22.29 GiB memory in use. Of the allocated memor…`
llava	single	`test_pixtral`	OOM	7/7	`(line 991) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 140.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 86.69 MiB is free. Process 1603048 has 22.21 GiB memory in use. Of the allocated mem…`
llava	single	`test_pixtral_4bit`	OOM	7/7	`(line 5095) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 7.77 GiB. GPU 0 has a total capacity of 22.30 GiB of which 36.69 MiB is free. Process 1603048 has 22.26 GiB memory in use. Of the allocated memo…`
mamba2	multi	`test_batched_equivalence_with_cache`	OOM	7/7	`(line 532) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 12.00 GiB. GPU 0 has a total capacity of 22.30 GiB of which 8.03 GiB is free. Process 732234 has 14.26 GiB memory in use. Of the allocated memory…`
mamba2	multi	`test_batched_equivalence_without_cache`	OOM	7/7	`(line 1370) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 46.69 MiB is free. Process 732234 has 22.25 GiB memory in use. Of the allocated memo…`
mamba2	multi	`test_mamba2_mixer_train_vs_eval_equivalence`	OOM	7/7	`(line 370) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 12.69 MiB is free. Process 732234 has 22.28 GiB memory in use. Of the allocated memor…`
mamba2	multi	`test_simple_generate`	OOM	7/7	`(line 1370) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 12.69 MiB is free. Process 732234 has 22.28 GiB memory in use. Of the allocated mem…`
mamba2	single	`test_batched_equivalence_with_cache`	OOM	7/7	`(line 532) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 12.00 GiB. GPU 0 has a total capacity of 22.30 GiB of which 8.03 GiB is free. Process 1646561 has 14.27 GiB memory in use. Of the allocated memor…`
mamba2	single	`test_batched_equivalence_without_cache`	OOM	7/7	`(line 1370) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 46.69 MiB is free. Process 1646561 has 22.25 GiB memory in use. Of the allocated mem…`
mamba2	single	`test_mamba2_mixer_train_vs_eval_equivalence`	OOM	7/7	`(line 370) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 12.69 MiB is free. Process 1646561 has 22.28 GiB memory in use. Of the allocated memo…`
mamba2	single	`test_simple_generate`	OOM	7/7	`(line 1370) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 12.69 MiB is free. Process 1646561 has 22.28 GiB memory in use. Of the allocated me…`
moshi	multi	`test_moshiko_greedy_unconditional_fp32`	OOM	7/7	`(line 991) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 34.00 MiB. GPU 1 has a total capacity of 22.30 GiB of which 16.69 MiB is free. Process 127291 has 22.28 GiB memory in use. Of the allocated memor…`
olmo	multi	`test_model_7b_logits`	OOM	6/7	`(line 353) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 116.69 MiB is free. Process 565912 has 22.18 GiB memory in use. Of the allocated mem…`
olmo	single	`test_model_7b_logits`	OOM	6/7	`(line 353) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 30.69 MiB is free. Process 193520 has 22.27 GiB memory in use. Of the allocated memor…`
phimoe	single	`test_model_phimoe_instruct_logits`	OOM	6/7	`(line 353) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.56 GiB. GPU 0 has a total capacity of 22.30 GiB of which 814.69 MiB is free. Process 329492 has 21.50 GiB memory in use. Of the allocated memor…`
phimoe	single	`test_phimoe_instruct_generation`	OOM	5/7	`(line 353) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.56 GiB. GPU 0 has a total capacity of 22.30 GiB of which 812.69 MiB is free. Process 329492 has 21.50 GiB memory in use. Of the allocated memor…`
phimoe	single	`test_phimoe_instruct_with_static_cache`	OOM	5/7	`(line 353) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.56 GiB. GPU 0 has a total capacity of 22.30 GiB of which 812.69 MiB is free. Process 329492 has 21.50 GiB memory in use. Of the allocated memor…`
pi0	multi	`test_train_pi0_base_libero`	OOM	7/7	`(line 785) torch.OutOfMemoryError: Caught OutOfMemoryError in replica 0 on device 0.`
pi0	single	`test_train_pi0_base_libero`	OOM	7/7	`(line 193) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 6.69 MiB is free. Process 777692 has 22.29 GiB memory in use. Of the allocated memory…`
qwen3_vl_moe	multi	`test_small_model_integration_test_expand`	OOM	7/7	`(line 991) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 768.00 MiB. GPU 1 has a total capacity of 22.30 GiB of which 240.69 MiB is free. Process 643790 has 22.06 GiB memory in use. Of the allocated mem…`
generation	multi	`test_validate_assistant`	cuda_runtime	7/7	`(line 1909) torch.AcceleratorError: CUDA error: device-side assert triggered`
generation	single	`test_validate_assistant`	cuda_runtime	7/7	`(line 1909) torch.AcceleratorError: CUDA error: device-side assert triggered`
cwm	multi	`test_cwm_integration`	import_or_config	7/7	`(line 1968) AttributeError: 'CwmDecoderLayer' object has no attribute 'attention_type'`
cwm	single	`test_cwm_integration`	import_or_config	6/7	`(line 1968) AttributeError: 'CwmDecoderLayer' object has no attribute 'attention_type'`
edgetam	multi	`test_inference_batched_images_batched_boxes`	import_or_config	7/7	`(line 249) TypeError: FeatureListNet.forward() got an unexpected keyword argument 'original_sizes'`
edgetam	multi	`test_inference_mask_generation_batched_images_batched_points_multi_points`	import_or_config	7/7	`(line 249) TypeError: FeatureListNet.forward() got an unexpected keyword argument 'original_sizes'`
edgetam	multi	`test_inference_mask_generation_batched_images_multi_points`	import_or_config	7/7	`(line 249) TypeError: FeatureListNet.forward() got an unexpected keyword argument 'original_sizes'`
edgetam	multi	`test_inference_mask_generation_from_existing_points_and_mask`	import_or_config	7/7	`(line 249) TypeError: FeatureListNet.forward() got an unexpected keyword argument 'original_sizes'`
edgetam	multi	`test_inference_mask_generation_one_point_multimask`	import_or_config	7/7	`(line 249) TypeError: FeatureListNet.forward() got an unexpected keyword argument 'original_sizes'`
edgetam	multi	`test_inference_mask_generation_one_point_no_multimask`	import_or_config	7/7	`(line 249) TypeError: FeatureListNet.forward() got an unexpected keyword argument 'original_sizes'`
edgetam	single	`test_inference_batched_images_batched_boxes`	import_or_config	7/7	`(line 249) TypeError: FeatureListNet.forward() got an unexpected keyword argument 'original_sizes'`
edgetam	single	`test_inference_mask_generation_batched_images_batched_points_multi_points`	import_or_config	7/7	`(line 249) TypeError: FeatureListNet.forward() got an unexpected keyword argument 'original_sizes'`
edgetam	single	`test_inference_mask_generation_batched_images_multi_points`	import_or_config	7/7	`(line 249) TypeError: FeatureListNet.forward() got an unexpected keyword argument 'original_sizes'`
edgetam	single	`test_inference_mask_generation_from_existing_points_and_mask`	import_or_config	7/7	`(line 249) TypeError: FeatureListNet.forward() got an unexpected keyword argument 'original_sizes'`
edgetam	single	`test_inference_mask_generation_one_point_multimask`	import_or_config	7/7	`(line 249) TypeError: FeatureListNet.forward() got an unexpected keyword argument 'original_sizes'`
edgetam	single	`test_inference_mask_generation_one_point_no_multimask`	import_or_config	7/7	`(line 249) TypeError: FeatureListNet.forward() got an unexpected keyword argument 'original_sizes'`
emu3	multi	`test_model_generate_images`	import_or_config	7/7	`(line 1968) AttributeError: 'Emu3ForConditionalGeneration' object has no attribute 'vocabulary_mapping'`
emu3	single	`test_model_generate_images`	import_or_config	7/7	`(line 1968) AttributeError: 'Emu3ForConditionalGeneration' object has no attribute 'vocabulary_mapping'`
generation	multi	`test_green_red_watermark_generation`	import_or_config	7/7	`(line 665) AttributeError: 'dict' object has no attribute 'validate'`
generation	single	`test_green_red_watermark_generation`	import_or_config	7/7	`(line 665) AttributeError: 'dict' object has no attribute 'validate'`
kosmos2	multi	`test_inference_interpolate_pos_encoding`	import_or_config	7/7	`(line 777) AttributeError: 'NoneType' object has no attribute 'last_hidden_state'`
kosmos2	single	`test_inference_interpolate_pos_encoding`	import_or_config	7/7	`(line 777) AttributeError: 'NoneType' object has no attribute 'last_hidden_state'`
nemotron	multi	`test_nemotron_8b_generation_fa2`	import_or_config	6/7	`(line 1725) ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package for FlashAttention2 doesn't seem to be installed.`
nemotron	single	`test_nemotron_8b_generation_fa2`	import_or_config	7/7	`(line 1725) ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package for FlashAttention2 doesn't seem to be installed.`
deepseek_v4	multi	`test_v4_flash_dequantized_chat_seven_prompts`	load_error	5/7	(line 501) ValueError: The current `device_map` had weights offloaded to the disk, which needed to be re-saved. This is either because the weights are not in `safetensors` format, or because the model uses an internal …
deepseek_v4	multi	`test_v4_flash_dequantized_generation`	load_error	5/7	(line 501) ValueError: The current `device_map` had weights offloaded to the disk, which needed to be re-saved. This is either because the weights are not in `safetensors` format, or because the model uses an internal …
deepseek_v4	single	`test_v4_flash_dequantized_chat_seven_prompts`	load_error	6/7	(line 501) ValueError: The current `device_map` had weights offloaded to the disk, which needed to be re-saved. This is either because the weights are not in `safetensors` format, or because the model uses an internal …
deepseek_v4	single	`test_v4_flash_dequantized_generation`	load_error	6/7	(line 501) ValueError: The current `device_map` had weights offloaded to the disk, which needed to be re-saved. This is either because the weights are not in `safetensors` format, or because the model uses an internal …
jais2	multi	`test_model_generation`	load_error	7/7	`(line 503) OSError: You are trying to access a gated repo.`
jais2	multi	`test_model_logits`	load_error	7/7	`(line 503) OSError: You are trying to access a gated repo.`
jais2	single	`test_model_generation`	load_error	7/7	`(line 503) OSError: You are trying to access a gated repo.`
jais2	single	`test_model_logits`	load_error	7/7	`(line 503) OSError: You are trying to access a gated repo.`
qwen3_moe	multi	`test_model_15b_a2b_generation`	load_error	7/7	`(line 74) ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modul…`
qwen3_moe	multi	`test_model_15b_a2b_logits`	load_error	7/7	`(line 74) ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modul…`
qwen3_moe	multi	`test_model_15b_a2b_long_prompt_sdpa`	load_error	7/7	`(line 74) ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modul…`
qwen3_moe	multi	`test_speculative_generation`	load_error	7/7	`(line 74) ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modul…`
audioflamingo3	multi	`test_fixture_batched_matches`	other	7/7	`(line 2935) RuntimeError: expected scalar type Float but found BFloat16`
audioflamingo3	multi	`test_fixture_single_matches`	other	7/7	`(line 2935) RuntimeError: expected scalar type Float but found BFloat16`
audioflamingo3	single	`test_fixture_batched_matches`	other	7/7	`(line 2935) RuntimeError: expected scalar type Float but found BFloat16`
audioflamingo3	single	`test_fixture_single_matches`	other	7/7	`(line 2935) RuntimeError: expected scalar type Float but found BFloat16`
bitnet	multi	`test_model_generation`	other	7/7	`(line 309) RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::BFloat16 != unsigned char`
bitnet	multi	`test_model_logits`	other	7/7	`(line 309) RuntimeError: expected m1 and m2 to have the same dtype, but got: c10::BFloat16 != unsigned char`
bitnet	single	`test_model_generation`	other	7/7	`(line 309) RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::BFloat16 != unsigned char`
bitnet	single	`test_model_logits`	other	7/7	`(line 309) RuntimeError: expected m1 and m2 to have the same dtype, but got: c10::BFloat16 != unsigned char`
blt	multi	`test_model_logits`	other	5/7	`(line 2567) RuntimeError: Expected all tensors to be on the same device, but got index is on cuda:0, different from other tensors on cuda:1 (when checking argument in method wrapper_CUDA__index_select)`
blt	multi	`test_model_logits_bf16`	other	5/7	`(line 2567) RuntimeError: Expected all tensors to be on the same device, but got index is on cuda:0, different from other tensors on cuda:1 (when checking argument in method wrapper_CUDA__index_select)`
bridgetower	multi	`test_constrastive_learning`	other	7/7	(line 421) ValueError: Unrecognized model in BridgeTower/bridgetower-large-itm-mlm-itc. Should have a `model_type` key in its config.json.
bridgetower	multi	`test_image_and_text_retrieval`	other	7/7	(line 421) ValueError: Unrecognized model in BridgeTower/bridgetower-base-itm-mlm. Should have a `model_type` key in its config.json.
bridgetower	multi	`test_masked_language_modeling`	other	7/7	(line 421) ValueError: Unrecognized model in BridgeTower/bridgetower-base-itm-mlm. Should have a `model_type` key in its config.json.
bridgetower	single	`test_constrastive_learning`	other	7/7	(line 421) ValueError: Unrecognized model in BridgeTower/bridgetower-large-itm-mlm-itc. Should have a `model_type` key in its config.json.
bridgetower	single	`test_image_and_text_retrieval`	other	7/7	(line 421) ValueError: Unrecognized model in BridgeTower/bridgetower-base-itm-mlm. Should have a `model_type` key in its config.json.
bridgetower	single	`test_masked_language_modeling`	other	7/7	(line 421) ValueError: Unrecognized model in BridgeTower/bridgetower-base-itm-mlm. Should have a `model_type` key in its config.json.
clvp	multi	`test_full_model_integration`	other	7/7	`(line 1310) RuntimeError: The expanded size of the tensor (2) must match the existing size (3) at non-singleton dimension 0. Target sizes: [2]. Tensor sizes: [3]`
clvp	single	`test_full_model_integration`	other	7/7	`(line 1310) RuntimeError: The expanded size of the tensor (2) must match the existing size (3) at non-singleton dimension 0. Target sizes: [2]. Tensor sizes: [3]`
cohere2_vision	multi	`test_model_forward_vision`	other	6/7	(line 488) OSError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/root/repos/moe/engines/command_a+_bf16'. Use `repo_type` argument if needed.
cohere2_vision	multi	`test_model_generate_vision`	other	6/7	(line 488) OSError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/root/repos/moe/engines/command_a+_bf16'. Use `repo_type` argument if needed.
cohere2_vision	single	`test_model_forward_vision`	other	6/7	(line 488) OSError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/root/repos/moe/engines/command_a+_bf16'. Use `repo_type` argument if needed.
cohere2_vision	single	`test_model_generate_vision`	other	6/7	(line 488) OSError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/root/repos/moe/engines/command_a+_bf16'. Use `repo_type` argument if needed.
colqwen2	multi	`test_model_integration_test`	other	7/7	`(line 110) ValueError: images must be an image, list of images or list of list of images`
colqwen2	single	`test_model_integration_test`	other	7/7	`(line 110) ValueError: images must be an image, list of images or list of list of images`
dbrx	multi	`test_tiny_model_logits`	other	7/7	`(line 146) huggingface_hub.errors.StrictDataclassFieldValidationError: Validation error for field 'moe_jitter_eps':`
dbrx	single	`test_tiny_model_logits`	other	7/7	`(line 146) huggingface_hub.errors.StrictDataclassFieldValidationError: Validation error for field 'moe_jitter_eps':`
deepseek_v3	single	`test_compile_static_cache`	other	7/7	(line 273) RuntimeError: We encountered some issues during automatic conversion of the weights. For details look at the `CONVERSION` entries of the above report!
deepseek_vl	multi	`test_model_text_generation`	other	7/7	`(line 67) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!`
deepseek_vl	multi	`test_model_text_generation_batched`	other	7/7	`(line 67) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!`
deepseek_vl	multi	`test_model_text_generation_with_multi_image`	other	7/7	`(line 67) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!`
deepseek_vl_hybrid	multi	`test_model_text_generation`	other	7/7	`(line 67) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!`
deepseek_vl_hybrid	single	`test_model_text_generation_with_multi_image`	other	7/7	`(line 468) RuntimeError: You can't move a model that has some modules offloaded to cpu or disk.`
flex_olmo	multi	`test_model_7b_greedy_generation`	other	7/7	(line 273) RuntimeError: We encountered some issues during automatic conversion of the weights. For details look at the `CONVERSION` entries of the above report!
gemma2	multi	`test_model_2b_pipeline_bf16_flex_attention`	other	7/7	`Cannot retrieve error message.`
gemma2	single	`test_model_2b_pipeline_bf16_flex_attention`	other	7/7	`Cannot retrieve error message.`
generation	multi	`test_cache_device_map_with_vision_layer_device_map`	other	7/7	`(line 1632) ValueError: The device_map provided does not give any device for the following parameters: model.vision_tower.embeddings.patch_embedding.weight, model.vision_tower.embeddings.patch_embedding.bias, model.vis…`
generation	multi	`test_generate_multi_accelerator_causal_mask`	other	7/7	`(line 1632) ValueError: The device_map provided does not give any device for the following parameters: model.visual.patch_embed.proj.weight, model.visual.blocks.0.norm1.weight, model.visual.blocks.0.norm1.bias, model.v…`
generation	single	`test_cache_device_map_with_vision_layer_device_map`	other	7/7	`(line 1632) ValueError: The device_map provided does not give any device for the following parameters: model.vision_tower.embeddings.patch_embedding.weight, model.vision_tower.embeddings.patch_embedding.bias, model.vis…`
git	multi	`test_inference_image_captioning`	other	7/7	`(line 4194) UnboundLocalError: local variable 'output' referenced before assignment`
git	single	`test_inference_image_captioning`	other	7/7	`(line 4194) UnboundLocalError: local variable 'output' referenced before assignment`
glm46v	multi	`test_small_model_integration_test`	other	7/7	`(line 2567) RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got CUDABFloat16Type instead (while checking arguments for embedding)`
glm46v	multi	`test_small_model_integration_test_batch`	other	7/7	`(line 2567) RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got CUDABFloat16Type instead (while checking arguments for embedding)`
glm46v	multi	`test_small_model_integration_test_batch_different_resolutions`	other	7/7	`(line 2567) RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got CUDABFloat16Type instead (while checking arguments for embedding)`
glm46v	multi	`test_small_model_integration_test_batch_wo_image`	other	7/7	`(line 2567) RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got CUDABFloat16Type instead (while checking arguments for embedding)`
glm46v	multi	`test_small_model_integration_test_expand`	other	7/7	`(line 2567) RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got CUDABFloat16Type instead (while checking arguments for embedding)`
glm46v	multi	`test_small_model_integration_test_with_video`	other	7/7	`(line 2567) RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.HalfTensor instead (while checking arguments for embedding)`
glm46v	single	`test_small_model_integration_test`	other	7/7	`(line 2567) RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got CUDABFloat16Type instead (while checking arguments for embedding)`
glm46v	single	`test_small_model_integration_test_batch`	other	7/7	`(line 2567) RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got CUDABFloat16Type instead (while checking arguments for embedding)`
glm46v	single	`test_small_model_integration_test_batch_different_resolutions`	other	7/7	`(line 2567) RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got CUDABFloat16Type instead (while checking arguments for embedding)`
glm46v	single	`test_small_model_integration_test_batch_wo_image`	other	7/7	`(line 2567) RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got CUDABFloat16Type instead (while checking arguments for embedding)`
glm46v	single	`test_small_model_integration_test_expand`	other	7/7	`(line 2567) RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got CUDABFloat16Type instead (while checking arguments for embedding)`
glm46v	single	`test_small_model_integration_test_with_video`	other	7/7	`(line 2567) RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.HalfTensor instead (while checking arguments for embedding)`
glm4v_moe	multi	`test_small_model_integration_test_batch`	other	7/7	(line 273) RuntimeError: We encountered some issues during automatic conversion of the weights. For details look at the `CONVERSION` entries of the above report!
glm4v_moe	multi	`test_small_model_integration_test_with_video`	other	7/7	(line 273) RuntimeError: We encountered some issues during automatic conversion of the weights. For details look at the `CONVERSION` entries of the above report!
glm_ocr	multi	`test_small_model_integration_test`	other	7/7	`(line 456) assert [151331, 1513..., 151343, ...] == [59248, 59250...6, 59280, ...]`
glm_ocr	single	`test_small_model_integration_test`	other	7/7	`(line 456) assert [151331, 1513..., 151343, ...] == [59248, 59250...6, 59280, ...]`
hunyuan_v1_moe	multi	`test_model_generation`	other	7/7	(line 273) RuntimeError: We encountered some issues during automatic conversion of the weights. For details look at the `CONVERSION` entries of the above report!
hyperclovax	multi	`test_model_seed_think_14b_bf16`	other	7/7	(line 1319) ValueError: There are one or more stop strings, either in the arguments to `generate` or in the model's generation config, but we could not locate a tokenizer. When generating with stop strings, you must pa…
hyperclovax	single	`test_model_seed_think_14b_bf16`	other	7/7	(line 1319) ValueError: There are one or more stop strings, either in the arguments to `generate` or in the model's generation config, but we could not locate a tokenizer. When generating with stop strings, you must pa…
janus	multi	`test_model_text_generation`	other	7/7	`(line 1735) ValueError: Image features and image tokens do not match, tokens: 0, features: 1179648`
janus	multi	`test_model_text_generation_with_multi_image`	other	7/7	`(line 1735) ValueError: Image features and image tokens do not match, tokens: 0, features: 2359296`
janus	single	`test_model_text_generation`	other	7/7	`(line 1735) ValueError: Image features and image tokens do not match, tokens: 0, features: 1179648`
janus	single	`test_model_text_generation_with_multi_image`	other	7/7	`(line 1735) ValueError: Image features and image tokens do not match, tokens: 0, features: 2359296`
mistral4	multi	`test_mistral_small_4_generation`	other	7/7	`(line 6741) RuntimeError: Expected mat_a to be Float32, BFloat16 or Float16 matrix, got Float8_e4m3fn`
mistral4	multi	`test_mistral_small_4_logits`	other	7/7	`(line 6741) RuntimeError: Expected mat_a to be Float32, BFloat16 or Float16 matrix, got Float8_e4m3fn`
mistral4	single	`test_mistral_small_4_generation`	other	7/7	`(line 6741) RuntimeError: Expected mat_a to be Float32, BFloat16 or Float16 matrix, got Float8_e4m3fn`
mistral4	single	`test_mistral_small_4_logits`	other	7/7	`(line 6741) RuntimeError: Expected mat_a to be Float32, BFloat16 or Float16 matrix, got Float8_e4m3fn`
modernvbert	multi	`test_masked_lm_inference`	other	7/7	`(line 835) huggingface_hub.errors.RepositoryNotFoundError: 404 Client Error. (Request ID: Root=1-6a2cd2c7-474585012cf449656746018f;6a94d3ec-b383-4df4-9312-21dd1b5b085c)`
modernvbert	single	`test_masked_lm_inference`	other	7/7	`(line 835) huggingface_hub.errors.RepositoryNotFoundError: 404 Client Error. (Request ID: Root=1-6a2cd1d6-5b0260645b34c8982407b11d;430a3b4b-e895-4d83-8777-c2d396c8cbe6)`
mpt	multi	`test_generation`	other	7/7	`(line 469) OSError: mosaicml/mpt-7b is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'`
mpt	multi	`test_generation_8k`	other	7/7	`(line 469) OSError: mosaicml/mpt-7b-8k is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'`
mpt	multi	`test_generation_batched`	other	7/7	`(line 469) OSError: mosaicml/mpt-7b is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'`
mpt	multi	`test_model_logits`	other	7/7	`(line 469) OSError: mosaicml/mpt-7b is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'`
mpt	single	`test_generation`	other	7/7	`(line 469) OSError: mosaicml/mpt-7b is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'`
mpt	single	`test_generation_8k`	other	7/7	`(line 469) OSError: mosaicml/mpt-7b-8k is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'`
mpt	single	`test_generation_batched`	other	7/7	`(line 469) OSError: mosaicml/mpt-7b is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'`
mpt	single	`test_model_logits`	other	7/7	`(line 469) OSError: mosaicml/mpt-7b is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'`
musicflamingo	multi	`test_fixture_batched_matches`	other	6/7	`(line 2935) RuntimeError: expected scalar type Float but found BFloat16`
musicflamingo	multi	`test_fixture_single_matches`	other	6/7	`(line 2935) RuntimeError: expected scalar type Float but found BFloat16`
musicflamingo	single	`test_fixture_batched_matches`	other	6/7	`(line 2935) RuntimeError: expected scalar type Float but found BFloat16`
musicflamingo	single	`test_fixture_single_matches`	other	6/7	`(line 2935) RuntimeError: expected scalar type Float but found BFloat16`
olmo	single	`test_model_1b_logits`	other	6/7	`(line 2567) RuntimeError: Expected all tensors to be on the same device, but got index is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__index_select)`
peft_integration	multi	`test_hotswap_with_compile_and_higher_rank_works`	other	7/7	(line 278) RuntimeError: You set `ignore_mismatched_sizes` to `False`, thus raising an error. For details look at the above report!
peft_integration	multi	`test_hotswap_with_compile_and_lower_rank_works`	other	7/7	(line 278) RuntimeError: You set `ignore_mismatched_sizes` to `False`, thus raising an error. For details look at the above report!
peft_integration	multi	`test_hotswap_without_compile_and_with_higher_rank_works`	other	7/7	(line 278) RuntimeError: You set `ignore_mismatched_sizes` to `False`, thus raising an error. For details look at the above report!
peft_integration	multi	`test_hotswap_without_compile_and_with_lower_rank_works`	other	7/7	(line 278) RuntimeError: You set `ignore_mismatched_sizes` to `False`, thus raising an error. For details look at the above report!
peft_integration	single	`test_hotswap_with_compile_and_higher_rank_works`	other	7/7	(line 278) RuntimeError: You set `ignore_mismatched_sizes` to `False`, thus raising an error. For details look at the above report!
peft_integration	single	`test_hotswap_with_compile_and_lower_rank_works`	other	7/7	(line 278) RuntimeError: You set `ignore_mismatched_sizes` to `False`, thus raising an error. For details look at the above report!
peft_integration	single	`test_hotswap_without_compile_and_with_higher_rank_works`	other	7/7	(line 278) RuntimeError: You set `ignore_mismatched_sizes` to `False`, thus raising an error. For details look at the above report!
peft_integration	single	`test_hotswap_without_compile_and_with_lower_rank_works`	other	7/7	(line 278) RuntimeError: You set `ignore_mismatched_sizes` to `False`, thus raising an error. For details look at the above report!
pegasus	multi	`test_device_map`	other	7/7	`(line 334) RuntimeError: Expected all tensors to be on the same device, but got other is on cuda:1, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__equal)`
pegasus	multi	`test_pegasus_xsum_summary`	other	7/7	`(line 350) assert torch.Size([2, 422]) == (2, 421)`
pegasus	single	`test_pegasus_xsum_summary`	other	7/7	`(line 350) assert torch.Size([2, 422]) == (2, 421)`
persimmon	multi	`test_model_8b_chat_logits`	other	6/7	(line 273) RuntimeError: We encountered some issues during automatic conversion of the weights. For details look at the `CONVERSION` entries of the above report!
phi3	multi	`test_export_static_cache`	other	6/7	`(line 1318) torch._dynamo.exc.Unsupported: Data-dependent branching`
phi3	single	`test_export_static_cache`	other	7/7	`(line 1318) torch._dynamo.exc.Unsupported: Data-dependent branching`
phimoe	multi	`test_model_phimoe_instruct_logits`	other	7/7	(line 273) RuntimeError: We encountered some issues during automatic conversion of the weights. For details look at the `CONVERSION` entries of the above report!
phimoe	multi	`test_phimoe_instruct_generation`	other	7/7	(line 273) RuntimeError: We encountered some issues during automatic conversion of the weights. For details look at the `CONVERSION` entries of the above report!
phimoe	multi	`test_phimoe_instruct_with_static_cache`	other	7/7	(line 273) RuntimeError: We encountered some issues during automatic conversion of the weights. For details look at the `CONVERSION` entries of the above report!
pvt_v2	multi	`test_inference_model`	other	7/7	`(line 4194) UnboundLocalError: local variable 'output' referenced before assignment`
pvt_v2	single	`test_inference_model`	other	7/7	`(line 4194) UnboundLocalError: local variable 'output' referenced before assignment`
qwen2_moe	multi	`test_speculative_generation`	other	7/7	(line 273) RuntimeError: We encountered some issues during automatic conversion of the weights. For details look at the `CONVERSION` entries of the above report!
qwen2_moe	single	`test_speculative_generation`	other	7/7	(line 273) RuntimeError: We encountered some issues during automatic conversion of the weights. For details look at the `CONVERSION` entries of the above report!
qwen3_omni_moe	multi	`test_small_model_integration_test`	other	7/7	(line 273) RuntimeError: We encountered some issues during automatic conversion of the weights. For details look at the `CONVERSION` entries of the above report!
qwen3_omni_moe	multi	`test_small_model_integration_test_batch`	other	7/7	(line 273) RuntimeError: We encountered some issues during automatic conversion of the weights. For details look at the `CONVERSION` entries of the above report!
qwen3_omni_moe	multi	`test_small_model_integration_test_multiturn`	other	7/7	(line 273) RuntimeError: We encountered some issues during automatic conversion of the weights. For details look at the `CONVERSION` entries of the above report!
qwen3_omni_moe	multi	`test_small_model_integration_test_w_audio`	other	7/7	(line 273) RuntimeError: We encountered some issues during automatic conversion of the weights. For details look at the `CONVERSION` entries of the above report!
qwen3_vl_moe	multi	`test_small_model_integration_test`	other	7/7	(line 273) RuntimeError: We encountered some issues during automatic conversion of the weights. For details look at the `CONVERSION` entries of the above report!
qwen3_vl_moe	multi	`test_small_model_integration_test_batch`	other	7/7	(line 273) RuntimeError: We encountered some issues during automatic conversion of the weights. For details look at the `CONVERSION` entries of the above report!
qwen3_vl_moe	multi	`test_small_model_integration_test_batch_different_resolutions`	other	7/7	(line 273) RuntimeError: We encountered some issues during automatic conversion of the weights. For details look at the `CONVERSION` entries of the above report!
qwen3_vl_moe	multi	`test_small_model_integration_test_batch_wo_image`	other	7/7	(line 273) RuntimeError: We encountered some issues during automatic conversion of the weights. For details look at the `CONVERSION` entries of the above report!
qwen3_vl_moe	multi	`test_small_model_integration_test_expand_with_video`	other	7/7	(line 273) RuntimeError: We encountered some issues during automatic conversion of the weights. For details look at the `CONVERSION` entries of the above report!
qwen3_vl_moe	multi	`test_small_model_integration_test_with_video`	other	7/7	(line 273) RuntimeError: We encountered some issues during automatic conversion of the weights. For details look at the `CONVERSION` entries of the above report!
seamless_m4t	multi	`test_speech_to_speech_model`	other	7/7	`(line 281) ValueError: Invalid input type. Must be a single audio or a list of audio`
seamless_m4t	multi	`test_speech_to_text_model`	other	7/7	`(line 281) ValueError: Invalid input type. Must be a single audio or a list of audio`
seamless_m4t	multi	`test_to_rus_speech`	other	7/7	`(line 281) ValueError: Invalid input type. Must be a single audio or a list of audio`
seamless_m4t	single	`test_speech_to_speech_model`	other	7/7	`(line 281) ValueError: Invalid input type. Must be a single audio or a list of audio`
seamless_m4t	single	`test_speech_to_text_model`	other	7/7	`(line 281) ValueError: Invalid input type. Must be a single audio or a list of audio`
seamless_m4t	single	`test_to_rus_speech`	other	7/7	`(line 281) ValueError: Invalid input type. Must be a single audio or a list of audio`
seamless_m4t_v2	multi	`test_speech_to_speech_model`	other	7/7	`(line 281) ValueError: Invalid input type. Must be a single audio or a list of audio`
seamless_m4t_v2	multi	`test_speech_to_text_model`	other	7/7	`(line 281) ValueError: Invalid input type. Must be a single audio or a list of audio`
seamless_m4t_v2	multi	`test_to_rus_speech`	other	7/7	`(line 281) ValueError: Invalid input type. Must be a single audio or a list of audio`
seamless_m4t_v2	single	`test_speech_to_speech_model`	other	7/7	`(line 281) ValueError: Invalid input type. Must be a single audio or a list of audio`
seamless_m4t_v2	single	`test_speech_to_text_model`	other	7/7	`(line 281) ValueError: Invalid input type. Must be a single audio or a list of audio`
seamless_m4t_v2	single	`test_to_rus_speech`	other	7/7	`(line 281) ValueError: Invalid input type. Must be a single audio or a list of audio`
superpoint	multi	`test_inference`	other	7/7	`(line 4194) UnboundLocalError: local variable 'output' referenced before assignment`
superpoint	single	`test_inference`	other	7/7	`(line 4194) UnboundLocalError: local variable 'output' referenced before assignment`
vision_encoder_decoder	multi	`test_forward_pass`	other	7/7	`(line 781) huggingface_hub.errors.RemoteEntryNotFoundError: 404 Client Error. (Request ID: Root=1-6a2ce203-064ca2350fd9eab00e952f84;8f588202-712b-49e0-9778-84672de20df7)`
vision_encoder_decoder	multi	`test_generation`	other	7/7	`(line 781) huggingface_hub.errors.RemoteEntryNotFoundError: 404 Client Error. (Request ID: Root=1-6a2ce205-134140366003d1700b15e4df;557daa74-b8c9-48ce-832c-22e2a0f869b5)`
vision_encoder_decoder	single	`test_forward_pass`	other	7/7	`(line 781) huggingface_hub.errors.RemoteEntryNotFoundError: 404 Client Error. (Request ID: Root=1-6a2ce1c5-0234cefe47ace15f2f3ce1c8;6d3423f2-eaa5-4917-84a6-32685f9735a6)`
vision_encoder_decoder	single	`test_generation`	other	7/7	`(line 781) huggingface_hub.errors.RemoteEntryNotFoundError: 404 Client Error. (Request ID: Root=1-6a2ce1c7-1b90e6b26cd0852c5e451d52;da762efd-bac2-4006-8263-d949916d16b8)`
whisper	multi	`test_distil_token_timestamp_generation`	other	7/7	`(line 370) RuntimeError: Input type (float) and bias type (c10::Half) should be the same`
whisper	multi	`test_large_batched_generation`	other	7/7	`(line 370) RuntimeError: Input type (float) and bias type (c10::Half) should be the same`
whisper	multi	`test_large_generation`	other	7/7	`(line 370) RuntimeError: Input type (float) and bias type (c10::Half) should be the same`
whisper	multi	`test_large_generation_multilingual`	other	7/7	`(line 370) RuntimeError: Input type (float) and bias type (c10::Half) should be the same`
whisper	multi	`test_large_timestamp_generation`	other	7/7	`(line 370) RuntimeError: Input type (float) and bias type (c10::Half) should be the same`
whisper	multi	`test_small_longform_timestamps_generation`	other	7/7	`(line 1882) KeyError: 0`
whisper	multi	`test_speculative_decoding_distil`	other	7/7	`(line 323) UnboundLocalError: local variable 'is_updated' referenced before assignment`
whisper	multi	`test_tiny_longform_timestamps_generation`	other	7/7	`(line 1698) KeyError: 0`
whisper	multi	`test_tiny_static_generation_long_form`	other	7/7	`(line 3098) RuntimeError: The size of tensor a (352) must match the size of tensor b (354) at non-singleton dimension 1`
whisper	multi	`test_tiny_timestamp_generation`	other	7/7	`(line 4176) IndexError: list index out of range`
whisper	multi	`test_whisper_longform_single_batch`	other	7/7	`(line 294) TypeError: '>=' not supported between instances of 'list' and 'int'`
whisper	single	`test_distil_token_timestamp_generation`	other	7/7	`(line 370) RuntimeError: Input type (float) and bias type (c10::Half) should be the same`
whisper	single	`test_large_batched_generation`	other	7/7	`(line 370) RuntimeError: Input type (float) and bias type (c10::Half) should be the same`
whisper	single	`test_large_generation`	other	7/7	`(line 370) RuntimeError: Input type (float) and bias type (c10::Half) should be the same`
whisper	single	`test_large_generation_multilingual`	other	7/7	`(line 370) RuntimeError: Input type (float) and bias type (c10::Half) should be the same`
whisper	single	`test_large_timestamp_generation`	other	7/7	`(line 370) RuntimeError: Input type (float) and bias type (c10::Half) should be the same`
whisper	single	`test_small_longform_timestamps_generation`	other	7/7	`(line 1882) KeyError: 0`
whisper	single	`test_speculative_decoding_distil`	other	7/7	`(line 323) UnboundLocalError: local variable 'is_updated' referenced before assignment`
whisper	single	`test_tiny_longform_timestamps_generation`	other	7/7	`(line 1698) KeyError: 0`
whisper	single	`test_tiny_static_generation_long_form`	other	7/7	`(line 3098) RuntimeError: The size of tensor a (352) must match the size of tensor b (354) at non-singleton dimension 1`
whisper	single	`test_tiny_timestamp_generation`	other	7/7	`(line 4176) IndexError: list index out of range`
whisper	single	`test_whisper_longform_single_batch`	other	7/7	`(line 294) TypeError: '>=' not supported between instances of 'list' and 'int'`
aya_vision	multi	`test_small_model_integration_generate_chat_template`	output_mismatch	7/7	`(line 355) AssertionError: 'The [29 chars] two cats resting on a bright pink blanket spread across a red' != 'The [29 chars] two cats resting on a bright pink blanket. The cats,'`
aya_vision	single	`test_small_model_integration_generate_chat_template`	output_mismatch	7/7	`(line 355) AssertionError: 'The [29 chars] two cats resting on a bright pink blanket spread across a red' != 'The [29 chars] two cats resting on a bright pink blanket. The cats,'`
big_bird	multi	`test_fill_mask`	output_mismatch	7/7	`(line 906) AssertionError: '' != 'happiness'`
big_bird	single	`test_fill_mask`	output_mismatch	7/7	`(line 906) AssertionError: '' != 'happiness'`
blip_2	multi	`test_inference_t5`	output_mismatch	7/7	`(line 1616) AssertionError: Lists differ: [0, 2335, 1556, 28, 1782, 30, 8, 2608, 1] != [0, 3, 9, 2335, 19, 1556, 28, 160, 1782, 30, 8, 2608, 1]`
blip_2	multi	`test_inference_t5_batched_beam_search`	output_mismatch	7/7	`(line 1671) AssertionError: Lists differ: [0, 3, 9, 2335, 19, 3823, 30, 8, 2608, 28, 160, 1782, 1] != [0, 3, 9, 2335, 19, 1556, 28, 160, 1782, 30, 8, 2608, 1]`
blip_2	multi	`test_inference_t5_multi_accelerator`	output_mismatch	7/7	`(line 1740) AssertionError: Lists differ: [0, 2335, 1556, 28, 1782, 30, 8, 2608, 1] != [0, 3, 9, 2335, 19, 1556, 28, 160, 1782, 30, 8, 2608, 1]`
blip_2	single	`test_inference_t5`	output_mismatch	6/7	`(line 1616) AssertionError: Lists differ: [0, 2335, 1556, 28, 1782, 30, 8, 2608, 1] != [0, 3, 9, 2335, 19, 1556, 28, 160, 1782, 30, 8, 2608, 1]`
blip_2	single	`test_inference_t5_batched_beam_search`	output_mismatch	6/7	`(line 1671) AssertionError: Lists differ: [0, 3, 9, 2335, 19, 3823, 30, 8, 2608, 28, 160, 1782, 1] != [0, 3, 9, 2335, 19, 1556, 28, 160, 1782, 30, 8, 2608, 1]`
bloom	multi	`test_batch_generated_text`	output_mismatch	7/7	`(line 621) AssertionError: Lists differ: ['Hello what is', 'Running a quick test with the followi[54 chars]the'] != ['Hello what is the best way to get the data from the se[127 chars]on2']`
bloom	multi	`test_batch_generation_padding`	output_mismatch	7/7	`(line 586) AssertionError: Lists differ: [5941[15 chars]632, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,[82 chars]0, 0] != [5941[15 chars]632, 419, 682, 15, 473, 912, 267, 40704, 15, 1[186 chars] 912]`
bloom	multi	`test_simple_generation`	output_mismatch	7/7	`(line 539) AssertionError: 'I en[58 chars] play. I am a very active person, and I am a v[75 chars]am a' != 'I en[58 chars] play with the kids. I am a very active person[86 chars]nd I'`
bloom	single	`test_batch_generated_text`	output_mismatch	7/7	`(line 621) AssertionError: Lists differ: ['Hello what is', 'Running a quick test with the followi[54 chars]the'] != ['Hello what is the best way to get the data from the se[127 chars]on2']`
bloom	single	`test_batch_generation_padding`	output_mismatch	7/7	`(line 586) AssertionError: Lists differ: [5941[15 chars]632, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,[82 chars]0, 0] != [5941[15 chars]632, 419, 682, 15, 473, 912, 267, 40704, 15, 1[186 chars] 912]`
bloom	single	`test_simple_generation`	output_mismatch	7/7	`(line 539) AssertionError: 'I en[58 chars] play. I am a very active person, and I am a v[75 chars]am a' != 'I en[58 chars] play with the kids. I am a very active person[86 chars]nd I'`
chameleon	multi	`test_model_7b`	output_mismatch	7/7	`(line 399) AssertionError: Lists differ: ['Des[115 chars] dot representing the position of the star Alp[92 chars]ted'] != ['Des[115 chars] dot in the center representing the North Star[99 chars]the']`
chameleon	multi	`test_model_7b_batched`	output_mismatch	7/7	`(line 445) AssertionError: Lists differ: ['Des[115 chars] dot representing the position of the star Alp[309 chars]on.'] != ['Des[115 chars] dot in the center representing the star Alpha[154 chars]The']`
chameleon	multi	`test_model_7b_multi_image`	output_mismatch	7/7	`(line 469) AssertionError: Lists differ: ['Wha[74 chars]een the night sky and the internet. The first [115 chars]The'] != ['Wha[74 chars]een two celestial objects, the stars and the c[113 chars]map']`
chameleon	single	`test_model_7b`	output_mismatch	6/7	`(line 399) AssertionError: Lists differ: ['Des[115 chars] dot representing the position of the star Alp[92 chars]ted'] != ['Des[115 chars] dot in the center representing the North Star[99 chars]the']`
chameleon	single	`test_model_7b_batched`	output_mismatch	6/7	`(line 445) AssertionError: Lists differ: ['Des[115 chars] dot representing the position of the star Alp[309 chars]on.'] != ['Des[115 chars] dot in the center representing the star Alpha[154 chars]The']`
chameleon	single	`test_model_7b_multi_image`	output_mismatch	6/7	`(line 469) AssertionError: Lists differ: ['Wha[74 chars]een the night sky and the internet. The first [115 chars]The'] != ['Wha[74 chars]een two celestial objects, the stars and the c[113 chars]map']`
clvp	multi	`test_conditional_encoder`	output_mismatch	7/7	`(line 552) AssertionError: Tensor-likes are not close!`
clvp	single	`test_conditional_encoder`	output_mismatch	7/7	`(line 552) AssertionError: Tensor-likes are not close!`
cohere2_vision	multi	`test_model_integration_forward`	output_mismatch	6/7	`(line 687) AssertionError: False is not true : Actual logits: tensor([2.3711, 1.6689, 1.8389, 1.9785, 1.9121], dtype=torch.float16)`
cohere2_vision	single	`test_model_integration_forward`	output_mismatch	6/7	`(line 687) AssertionError: False is not true : Actual logits: tensor([2.3711, 1.6689, 1.8389, 1.9785, 1.9121], dtype=torch.float16)`
colqwen2	multi	`test_model_integration_test_2`	output_mismatch	7/7	`(line 400) AssertionError: Expected scores tensor([[16.3750, 10.9375, 14.7500],`
colqwen2	single	`test_model_integration_test_2`	output_mismatch	7/7	`(line 400) AssertionError: Expected scores tensor([[16.3750, 10.9375, 14.7500],`
convnextv2	multi	`test_inference_image_classification_head`	output_mismatch	7/7	`(line 308) AssertionError: Tensor-likes are not close!`
convnextv2	single	`test_inference_image_classification_head`	output_mismatch	7/7	`(line 308) AssertionError: Tensor-likes are not close!`
cvt	multi	`test_inference_image_classification_head`	output_mismatch	7/7	`(line 271) AssertionError: Tensor-likes are not close!`
cvt	single	`test_inference_image_classification_head`	output_mismatch	7/7	`(line 271) AssertionError: Tensor-likes are not close!`
cwm	single	`test_cwm_sliding_window_long_sequence`	output_mismatch	6/7	`(line 182) AssertionError: Tensor-likes are not close!`
dab_detr	multi	`test_inference_object_detection_head`	output_mismatch	7/7	`(line 805) AssertionError: Tensor-likes are not close!`
dab_detr	single	`test_inference_object_detection_head`	output_mismatch	7/7	`(line 805) AssertionError: Tensor-likes are not close!`
dac	multi	`test_integration_0_dac_16khz`	output_mismatch	7/7	`(line 819) AssertionError: Tensor-likes are not close!`
dac	multi	`test_integration_1_dac_24khz`	output_mismatch	7/7	`(line 813) AssertionError: Scalars are not close!`
dac	multi	`test_integration_2_dac_44khz`	output_mismatch	7/7	`(line 825) AssertionError: Scalars are not close!`
dac	multi	`test_integration_batch_0_dac_16khz`	output_mismatch	7/7	`(line 870) AssertionError: Scalars are not close!`
dac	multi	`test_integration_batch_1_dac_24khz`	output_mismatch	7/7	`(line 876) AssertionError: Tensor-likes are not close!`
dac	multi	`test_integration_batch_2_dac_44khz`	output_mismatch	7/7	`(line 876) AssertionError: Tensor-likes are not close!`
dac	single	`test_integration_0_dac_16khz`	output_mismatch	6/7	`(line 819) AssertionError: Tensor-likes are not close!`
dac	single	`test_integration_1_dac_24khz`	output_mismatch	6/7	`(line 813) AssertionError: Scalars are not close!`
dac	single	`test_integration_2_dac_44khz`	output_mismatch	6/7	`(line 825) AssertionError: Scalars are not close!`
dac	single	`test_integration_batch_0_dac_16khz`	output_mismatch	6/7	`(line 870) AssertionError: Scalars are not close!`
dac	single	`test_integration_batch_1_dac_24khz`	output_mismatch	6/7	`(line 876) AssertionError: Tensor-likes are not close!`
dac	single	`test_integration_batch_2_dac_44khz`	output_mismatch	6/7	`(line 876) AssertionError: Tensor-likes are not close!`
deepseek_v3	multi	`test_compile_static_cache`	output_mismatch	7/7	`(line 424) AssertionError: Lists differ: ['Sim[41 chars]that Frojekecdytesాలు sicʰtinaccianntuala bre[327 chars]rew'] != ['Sim[41 chars]that aportersh455elike injection tactics-altit[355 chars]ick']`
deepseek_vl	single	`test_model_text_generation_batched`	output_mismatch	7/7	`(line 147) AssertionError: Lists differ: ['You[222 chars]tant:The image depicts a snowy landscape with [367 chars]the'] != ['You[222 chars]tant:What is a cat, a cat, a cat, a cat, a cat[329 chars]the']`
deepseek_vl_hybrid	single	`test_model_text_generation_batched`	output_mismatch	7/7	`(line 370) AssertionError: Lists differ: ['You[224 chars]nt:\nThe image depicts a fluffy, light brown a[371 chars]he '] != ['You[224 chars]nt:\nA fluffy animal in a fluffyThe image,The [329 chars]he ']`
depth_anything	multi	`test_inference`	output_mismatch	7/7	`(line 259) AssertionError: Tensor-likes are not close!`
depth_anything	single	`test_inference`	output_mismatch	7/7	`(line 259) AssertionError: Tensor-likes are not close!`
dia	multi	`test_dia_model_integration_generate_audio_context`	output_mismatch	7/7	`(line 732) AssertionError: Tensor-likes are not equal!`
dia	single	`test_dia_model_integration_generate_audio_context`	output_mismatch	7/7	`(line 732) AssertionError: Tensor-likes are not equal!`
diffllama	multi	`test_compile_static_cache`	output_mismatch	7/7	`(line 484) AssertionError: Lists differ: ['Sim[41 chars]that 1) the speed of light is constant in all [301 chars]y p'] != ['Sim[41 chars]that 2.5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 '[133 chars]a a']`
diffllama	single	`test_compile_static_cache`	output_mismatch	7/7	`(line 484) AssertionError: Lists differ: ['Sim[41 chars]that 1) the speed of light is constant in all [301 chars]y p'] != ['Sim[41 chars]that 2.5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 '[133 chars]a a']`
efficientnet	multi	`test_inference_image_classification_head`	output_mismatch	7/7	`(line 259) AssertionError: Tensor-likes are not close!`
efficientnet	single	`test_inference_image_classification_head`	output_mismatch	7/7	`(line 259) AssertionError: Tensor-likes are not close!`
emu3	single	`test_model_generation`	output_mismatch	7/7	`(line 363) AssertionError: Lists differ: ['USE[85 chars]ANT: The image captures a moment of tranquilit[145 chars] in'] != ['USE[85 chars]ANT: 1. The image is a 1.你好!']`
eomt_dinov3	multi	`test_inference_bf16`	output_mismatch	7/7	`(line 310) AssertionError: Tensor-likes are not close!`
eomt_dinov3	single	`test_inference_bf16`	output_mismatch	7/7	`(line 310) AssertionError: Tensor-likes are not close!`
evolla	multi	`test_inference_natural_language_protein_reasoning`	output_mismatch	7/7	`(line 364) AssertionError: 'This protein' not found in 'systemYouareanAIexpertthatcanansweranyquestionsaboutprotein.userWhatisthefunctionofthisprotein?assistantĊThisĠproteinĠisĠaĠcriticalĠenzymeĠinvolvedĠinĠtheĠmetabol…`
evolla	single	`test_inference_natural_language_protein_reasoning`	output_mismatch	7/7	`(line 364) AssertionError: 'This protein' not found in 'systemYouareanAIexpertthatcanansweranyquestionsaboutprotein.userWhatisthefunctionofthisprotein?assistantĊThisĠproteinĠisĠaĠcriticalĠenzymeĠinvolvedĠinĠtheĠmetabol…`
exaone4	single	`test_model_generation_beyond_sliding_window`	output_mismatch	7/7	`(line 160) AssertionError: " Thi[46 chars] and the atmosphere is so relaxing. I'm gratef[47 chars]. It" != " Thi[46 chars] and I'm grateful for the opportunity to exper[26 chars]reak"`
exaone4	single	`test_model_logits`	output_mismatch	7/7	`(line 99) AssertionError: Tensor-likes are not close!`
exaone_moe	multi	`test_model_logits`	output_mismatch	7/7	`(line 120) AssertionError: Tensor-likes are not close!`
exaone_moe	single	`test_model_logits`	output_mismatch	7/7	`(line 120) AssertionError: Tensor-likes are not close!`
falcon_h1	multi	`test_falcon_h1_hard`	output_mismatch	7/7	`(line 470) AssertionError: 'user\nTell me about the french revolutio[1920 chars]ct**' != "user\nTell me about the french revolutio[1929 chars]n6. "`
falcon_h1	single	`test_falcon_h1_hard`	output_mismatch	7/7	`(line 470) AssertionError: 'user\nTell me about the french revolutio[1920 chars]ct**' != "user\nTell me about the french revolutio[1929 chars]n6. "`
falcon_mamba	multi	`test_batched_generation`	output_mismatch	7/7	`(line 488) AssertionError: Lists differ: ['Hello today I will be talking about the “Theory of Rela[161 chars]bal'] != ['Hello today I am going to talk about the “Theory of Rel[159 chars]bal']`
falcon_mamba	multi	`test_generation_4bit`	output_mismatch	7/7	`(line 438) AssertionError: 'Hello today I\'m going to be talking about the "A" in the "A-B' != "Hello today Iava,\n\nI'm sorry to hear that you're having trouble with the "`
falcon_mamba	multi	`test_generation_fp16`	output_mismatch	7/7	`(line 423) AssertionError: 'Hello today I am going to talk about the “Theory of Re[27 chars]n.\n' != 'Hello today Iava,\n\nI am writing to you today to disc[49 chars]tyle'`
falcon_mamba	multi	`test_generation_torch_compile`	output_mismatch	7/7	`(line 451) AssertionError: 'Hello today I am going to talk about the “Theory of Re[27 chars]n.\n' != 'Hello today Iava,\n\nI am writing to you today to disc[49 chars]tyle'`
falcon_mamba	single	`test_batched_generation`	output_mismatch	7/7	`(line 488) AssertionError: Lists differ: ['Hello today I will be talking about the “Theory of Rela[161 chars]bal'] != ['Hello today I am going to talk about the “Theory of Rel[159 chars]bal']`
falcon_mamba	single	`test_generation_4bit`	output_mismatch	7/7	`(line 438) AssertionError: 'Hello today I\'m going to be talking about the "A" in the "A-B' != "Hello today Iava,\n\nI'm sorry to hear that you're having trouble with the "`
falcon_mamba	single	`test_generation_fp16`	output_mismatch	7/7	`(line 423) AssertionError: 'Hello today I am going to talk about the “Theory of Re[27 chars]n.\n' != 'Hello today Iava,\n\nI am writing to you today to disc[49 chars]tyle'`
falcon_mamba	single	`test_generation_torch_compile`	output_mismatch	7/7	`(line 451) AssertionError: 'Hello today I am going to talk about the “Theory of Re[27 chars]n.\n' != 'Hello today Iava,\n\nI am writing to you today to disc[49 chars]tyle'`
fastspeech2_conformer	multi	`test_training_integration`	output_mismatch	7/7	`(line 453) AssertionError: Tensor-likes are not close!`
fastspeech2_conformer	single	`test_training_integration`	output_mismatch	7/7	`(line 453) AssertionError: Tensor-likes are not close!`
flava	multi	`test_inference_with_itm_labels`	output_mismatch	7/7	`(line 1223) AssertionError: The values for attribute 'shape' do not match: torch.Size([1, 2]) != torch.Size([2, 2]).`
flava	multi	`test_inference`	output_mismatch	7/7	`(line 899) AssertionError: -1352.535400390625 != -1352.4685 within 4 places (0.06690039062505093 difference)`
flava	single	`test_inference_with_itm_labels`	output_mismatch	7/7	`(line 1223) AssertionError: The values for attribute 'shape' do not match: torch.Size([1, 2]) != torch.Size([2, 2]).`
flava	single	`test_inference`	output_mismatch	7/7	`(line 899) AssertionError: -1352.535400390625 != -1352.4685 within 4 places (0.06690039062505093 difference)`
flex_olmo	multi	`test_model_7b_logits`	output_mismatch	7/7	`(line 87) AssertionError: Tensor-likes are not close!`
flex_olmo	single	`test_model_7b_logits`	output_mismatch	7/7	`(line 87) AssertionError: Tensor-likes are not close!`
florence2	multi	`test_large_model_inference_eager`	output_mismatch	7/7	`(line 470) AssertionError: Lists differ: [[2, [144 chars], 5, 2014, 6, 8, 11, 5, 3618, 6, 89, 32, 3980,[51 chars], 2]] != [[2, [144 chars], 5, 921, 6, 8, 11, 5, 3618, 6, 89, 32, 1104, [44 chars], 2]]`
florence2	single	`test_large_model_inference_eager`	output_mismatch	7/7	`(line 470) AssertionError: Lists differ: [[2, [144 chars], 5, 2014, 6, 8, 11, 5, 3618, 6, 89, 32, 3980,[51 chars], 2]] != [[2, [144 chars], 5, 921, 6, 8, 11, 5, 3618, 6, 89, 32, 1104, [44 chars], 2]]`
fsmt	multi	`test_inference_no_head`	output_mismatch	7/7	`(line 484) AssertionError: Tensor-likes are not close!`
fsmt	multi	`test_translation_direct_0_en_ru`	output_mismatch	7/7	`(line 517) AssertionError:`
fsmt	multi	`test_translation_direct_1_ru_en`	output_mismatch	7/7	`(line 517) AssertionError:`
fsmt	single	`test_inference_no_head`	output_mismatch	7/7	`(line 484) AssertionError: Tensor-likes are not close!`
fsmt	single	`test_translation_direct_0_en_ru`	output_mismatch	7/7	`(line 517) AssertionError:`
fsmt	single	`test_translation_direct_1_ru_en`	output_mismatch	7/7	`(line 517) AssertionError:`
fuyu	multi	`test_greedy_generation`	output_mismatch	7/7	`(line 295) AssertionError: '\x04 A bus parked on the side of a road.' != 'A blue bus parked on the side of a road.'`
fuyu	single	`test_greedy_generation`	output_mismatch	7/7	`(line 295) AssertionError: '\x04 A bus parked on the side of a road.' != 'A blue bus parked on the side of a road.'`
gemma	multi	`test_compile_static_cache`	output_mismatch	7/7	`(line 337) AssertionError: Lists differ: ['Hel[196 chars]tdi 105bhp.\nI have a problem with the engine [37 chars]the'] != ['Hel[196 chars]tdi 110bhp.\nI have a problem with the engine [49 chars]ugh']`
gemma	multi	`test_export_static_cache`	output_mismatch	7/7	`(line 414) AssertionError: Lists differ: ['Hel[87 chars] in the 1990s. I have been looking on the internet and I have'] != ['Hel[87 chars] in the 1990s. I have looked on the internet and I have found']`
gemma	multi	`test_model_2b_4bit`	output_mismatch	7/7	`(line 190) AssertionError: Lists differ: ['Hel[118 chars] you a few of my favorite and most used brushes.\n\nI"] != ['Hel[118 chars] you my experience with the new wattpad wattpa[38 chars]pad"]`
gemma	multi	`test_model_7b_4bit`	output_mismatch	7/7	`(line 317) AssertionError: Lists differ: ['Hel[59 chars]ke a "self balancing" robot. I have', 'Hi toda[76 chars] of'] != ['Hel[59 chars]ke a program that will take a number and then'[93 chars]!:)']`
gemma	multi	`test_model_7b_bf16`	output_mismatch	7/7	`(line 258) AssertionError: Lists differ: ['Hel[59 chars]ke a small game. I have a few questions', 'Hi [86 chars]and'] != ['Hel[59 chars]ke a game in which you have to get a', 'Hi tod[83 chars]and']`
gemma	multi	`test_model_7b_fp16`	output_mismatch	7/7	`(line 228) AssertionError: Lists differ: ['Hel[27 chars]a 1995 4.0L 4x4. I', 'Hi today I am going to s[51 chars] 3D'] != ['Hel[27 chars]a 1999 4.0L 4x4. I', 'Hi today I am going to s[51 chars] 3D']`
gemma	multi	`test_model_7b_fp16_static_cache`	output_mismatch	7/7	`(line 288) AssertionError: Lists differ: ['Hel[29 chars]1995 4.0L 4x4. I', 'Hi today I am going to sho[49 chars] 3D'] != ['Hel[29 chars]1995 3000gt SL. I have a', 'Hi today I am goin[57 chars] 3D']`
gemma	single	`test_compile_static_cache`	output_mismatch	7/7	`(line 337) AssertionError: Lists differ: ['Hel[196 chars]tdi 105bhp.\nI have a problem with the engine [37 chars]the'] != ['Hel[196 chars]tdi 110bhp.\nI have a problem with the engine [49 chars]ugh']`
gemma	single	`test_export_static_cache`	output_mismatch	7/7	`(line 414) AssertionError: Lists differ: ['Hel[87 chars] in the 1990s. I have been looking on the internet and I have'] != ['Hel[87 chars] in the 1990s. I have looked on the internet and I have found']`
gemma	single	`test_model_2b_4bit`	output_mismatch	7/7	`(line 190) AssertionError: Lists differ: ['Hel[118 chars] you a few of my favorite and most used brushes.\n\nI"] != ['Hel[118 chars] you my experience with the new wattpad wattpa[38 chars]pad"]`
gemma	single	`test_model_7b_4bit`	output_mismatch	7/7	`(line 317) AssertionError: Lists differ: ['Hel[59 chars]ke a "self balancing" robot. I have', 'Hi toda[76 chars] of'] != ['Hel[59 chars]ke a program that will take a number and then'[93 chars]!:)']`
gemma	single	`test_model_7b_bf16`	output_mismatch	7/7	`(line 258) AssertionError: Lists differ: ['Hel[59 chars]ke a small game. I have a few questions', 'Hi [86 chars]and'] != ['Hel[59 chars]ke a game in which you have to get a', 'Hi tod[83 chars]and']`
gemma	single	`test_model_7b_fp16`	output_mismatch	7/7	`(line 228) AssertionError: Lists differ: ['Hel[27 chars]a 1995 4.0L 4x4. I', 'Hi today I am going to s[51 chars] 3D'] != ['Hel[27 chars]a 1999 4.0L 4x4. I', 'Hi today I am going to s[51 chars] 3D']`
gemma	single	`test_model_7b_fp16_static_cache`	output_mismatch	7/7	`(line 288) AssertionError: Lists differ: ['Hel[27 chars]a 1999 4.0L 4x4. I', 'Hi today I am going to s[51 chars] 3D'] != ['Hel[27 chars]a 1995 3000gt SL. I have a', 'Hi today I am go[59 chars] 3D']`
gemma2	multi	`test_model_2b_pipeline_bf16_flex_attention`	output_mismatch	7/7	`(line 2876) Failed: (subprocess) AssertionError: "Hi t[26 chars]ng about the 10 best anime of all time.\n\n1" != "Hi t[26 chars]ng about the 10 most powerful characters in the Naruto series."`
gemma2	single	`test_model_2b_pipeline_bf16_flex_attention`	output_mismatch	7/7	`(line 2876) Failed: (subprocess) AssertionError: "Hi t[26 chars]ng about the 10 best anime of all time.\n\n1" != "Hi t[26 chars]ng about the 10 most powerful characters in the Naruto series."`
gemma3	multi	`test_dynamic_sliding_window_is_default`	output_mismatch	7/7	`(line 874) AssertionError: 'DynamicSlidingWindowLayer' unexpectedly found in 'DynamicCache(layers=[DynamicSlidingWindowLayer, DynamicSlidingWindowLayer, DynamicSlidingWindowLayer, DynamicSlidingWindowLayer, DynamicSlid…`
gemma3	multi	`test_model_1b_text_only`	output_mismatch	7/7	`(line 728) AssertionError: Lists differ: ['Wri[48 chars]data streams, a boundless flow,\nA silent worl[63 chars]ing'] != ['Wri[48 chars]data flows, a silent stream,\nInto the neural [51 chars],\n']`
gemma3	multi	`test_model_4b_batch`	output_mismatch	7/7	`(line 548) AssertionError: Lists differ: ['use[149 chars]with turquoise water and a blue sky in the bac[227 chars]own"] != ['use[149 chars]with clear turquoise water and a blue sky in t[231 chars]own"]`
gemma3	multi	`test_model_4b_batch_crops`	output_mismatch	7/7	`(line 663) AssertionError: Lists differ: ['user\nYou are a helpful assistant.\n\nHe[674 chars]h a'] != ["user\nYou are a helpful assistant.\n\nHe[674 chars]h a']`
gemma3	multi	`test_model_4b_crops`	output_mismatch	7/7	`(line 590) AssertionError: Lists differ: ["user\nYou are a helpful assistant.\n\nHe[268 chars]the"] != ['user\nYou are a helpful assistant.\n\nHe[268 chars]the']`
gemma3	single	`test_dynamic_sliding_window_is_default`	output_mismatch	7/7	`(line 874) AssertionError: 'DynamicSlidingWindowLayer' unexpectedly found in 'DynamicCache(layers=[DynamicSlidingWindowLayer, DynamicSlidingWindowLayer, DynamicSlidingWindowLayer, DynamicSlidingWindowLayer, DynamicSlid…`
gemma3	single	`test_model_1b_text_only`	output_mismatch	7/7	`(line 728) AssertionError: Lists differ: ['Wri[48 chars]data streams, a boundless flow,\nA silent worl[63 chars]ing'] != ['Wri[48 chars]data flows, a silent stream,\nInto the neural [51 chars],\n']`
gemma3	single	`test_model_4b_batch`	output_mismatch	7/7	`(line 548) AssertionError: Lists differ: ['use[149 chars]with turquoise water and a blue sky in the bac[227 chars]own"] != ['use[149 chars]with clear turquoise water and a blue sky in t[231 chars]own"]`
gemma3	single	`test_model_4b_batch_crops`	output_mismatch	7/7	`(line 663) AssertionError: Lists differ: ['user\nYou are a helpful assistant.\n\nHe[674 chars]h a'] != ["user\nYou are a helpful assistant.\n\nHe[674 chars]h a']`
gemma3	single	`test_model_4b_crops`	output_mismatch	7/7	`(line 590) AssertionError: Lists differ: ["user\nYou are a helpful assistant.\n\nHe[268 chars]the"] != ['user\nYou are a helpful assistant.\n\nHe[268 chars]the']`
gemma3n	multi	`test_generation_beyond_sliding_window`	output_mismatch	7/7	`(line 1196) AssertionError: Lists differ: [' and I find it very relaxing. I also lik[112 chars]re'"] != [" and the people are so friendly. I'm so [93 chars]re'"]`
gemma3n	multi	`test_model_4b_batch`	output_mismatch	7/7	`(line 1083) AssertionError: Lists differ: ['use[196 chars]ewer and has its tongue', "user\nYou are a hel[193 chars]cow"] != ['use[196 chars]ewer with its head slightly', "user\nYou are a[197 chars]cow"]`
gemma3n	multi	`test_model_4b_bf16`	output_mismatch	7/7	`(line 998) AssertionError: Lists differ: ['use[149 chars]to a turquoise ocean. The cow is facing the vi[31 chars]ned'] != ['use[149 chars]to a clear blue ocean. The cow is facing the v[25 chars]tly']`
gemma3n	multi	`test_model_4b_image`	output_mismatch	7/7	`(line 1110) AssertionError: Lists differ: ['use[149 chars]to a turquoise ocean. The cow is facing the vi[31 chars]ned'] != ['use[149 chars]to a clear blue ocean. The cow is facing the v[25 chars]tly']`
gemma3n	multi	`test_model_4b_multiimage`	output_mismatch	7/7	`(line 1151) AssertionError: Lists differ: ['use[140 chars]n district. Here are the key elements:\n\n* *A prominent red'] != ['use[140 chars]n district. Here are some of the key elements:\n\n **A']`
gemma3n	single	`test_generation_beyond_sliding_window`	output_mismatch	7/7	`(line 1196) AssertionError: Lists differ: [' and I find it very relaxing. I also lik[112 chars]re'"] != [" and the people are so friendly. I'm so [93 chars]re'"]`
gemma3n	single	`test_model_4b_batch`	output_mismatch	7/7	`(line 1083) AssertionError: Lists differ: ['use[196 chars]ewer and has its tongue', "user\nYou are a hel[193 chars]cow"] != ['use[196 chars]ewer with its head slightly', "user\nYou are a[197 chars]cow"]`
gemma3n	single	`test_model_4b_bf16`	output_mismatch	7/7	`(line 998) AssertionError: Lists differ: ['use[149 chars]to a turquoise ocean. The cow is facing the vi[31 chars]ned'] != ['use[149 chars]to a clear blue ocean. The cow is facing the v[25 chars]tly']`
gemma3n	single	`test_model_4b_image`	output_mismatch	7/7	`(line 1110) AssertionError: Lists differ: ['use[149 chars]to a turquoise ocean. The cow is facing the vi[31 chars]ned'] != ['use[149 chars]to a clear blue ocean. The cow is facing the v[25 chars]tly']`
gemma3n	single	`test_model_4b_multiimage`	output_mismatch	7/7	`(line 1151) AssertionError: Lists differ: ['use[140 chars]n district. Here are the key elements:\n\n* *A prominent red'] != ['use[140 chars]n district. Here are some of the key elements:\n\n **A']`
gemma4	multi	`test_model_multiimage`	output_mismatch	7/7	`(line 742) AssertionError: Lists differ: ['Bas[66 chars]und & Street Scene:*\n Roadway: There is an'] != ['Bas[66 chars]und & Street Scene:*\n Traffic Sign: The most prominent']`
gemma4	multi	`test_model_with_image`	output_mismatch	7/7	`(line 655) AssertionError: Lists differ: ['Thi[61 chars] beach with the ocean in the background under a clear'] != ['Thi[61 chars] beach with the ocean and a blue sky** in the background']`
gemma4	multi	`test_model_with_image_batch`	output_mismatch	7/7	`(line 706) AssertionError: Lists differ: ['Thi[81 chars]ocean in the background under a clear', "N[102 chars] on"] != ['Thi[81 chars]ocean and a blue sky** in the background', 'No[127 chars]lue']`
gemma4	single	`test_model_multiimage`	output_mismatch	7/7	`(line 742) AssertionError: Lists differ: ['Bas[66 chars]und & Street Scene:*\n Roadway: There is an'] != ['Bas[66 chars]und & Street Scene:*\n Traffic Sign: The most prominent']`
gemma4	single	`test_model_with_image`	output_mismatch	7/7	`(line 655) AssertionError: Lists differ: ['Thi[61 chars] beach with the ocean in the background under a clear'] != ['Thi[61 chars] beach with the ocean and a blue sky** in the background']`
gemma4	single	`test_model_with_image_batch`	output_mismatch	7/7	`(line 706) AssertionError: Lists differ: ['Thi[81 chars]ocean in the background under a clear', "N[102 chars] on"] != ['Thi[81 chars]ocean and a blue sky** in the background', 'No[127 chars]lue']`
generation	multi	`test_TopH_example_integration`	output_mismatch	7/7	`(line 3215) AssertionError: Lists differ: ['Tel[23 chars]key. Sure, here\'s one for you:\n\nWhy did the[67 chars]s"!'] != ['Tel[23 chars]key. Why did the monkey go to the doctor? Beca[34 chars]c"!']`
generation	multi	`test_assisted_generation_early_exit`	output_mismatch	7/7	`(line 4077) AssertionError: Lists differ: ['Ali[20 chars]ng a game of poker. Alice has a pair of 7s and Bob has a pair'] != ['Ali[20 chars]ng a game of poker. Alice has a pair of 8s and Bob has a pair']`
generation	multi	`test_beam_search_advanced_stopping_criteria`	output_mismatch	7/7	`(line 681) AssertionError: True is not false`
generation	multi	`test_beam_search_early_stop_heuristic`	output_mismatch	7/7	`(line 2965) AssertionError: "<\|us[317 chars]}\\).\nThe sum of 3 and 5 is \\(3 + 5 = 8\\).\[40 chars]\\)." != "<\|us[317 chars]}\\)."`
generation	single	`test_TopH_example_integration`	output_mismatch	7/7	`(line 3215) AssertionError: Lists differ: ['Tel[23 chars]key. Sure, here\'s one for you:\n\nWhy did the[67 chars]s"!'] != ['Tel[23 chars]key. Why did the monkey go to the doctor? Beca[34 chars]c"!']`
generation	single	`test_assisted_generation_early_exit`	output_mismatch	7/7	`(line 4077) AssertionError: Lists differ: ['Ali[20 chars]ng a game of poker. Alice has a pair of 7s and Bob has a pair'] != ['Ali[20 chars]ng a game of poker. Alice has a pair of 8s and Bob has a pair']`
generation	single	`test_beam_search_advanced_stopping_criteria`	output_mismatch	7/7	`(line 681) AssertionError: True is not false`
generation	single	`test_beam_search_early_stop_heuristic`	output_mismatch	7/7	`(line 2965) AssertionError: "<\|us[317 chars]}\\).\nThe sum of 3 and 5 is \\(3 + 5 = 8\\).\[40 chars]\\)." != "<\|us[317 chars]}\\)."`
glm	multi	`test_model_9b_eager`	output_mismatch	7/7	`(line 133) AssertionError: Lists differ: ['Hel[140 chars]ou how to make a simple and easy to make a DIY paper flower.'] != ['Hel[140 chars]ou how to make a simple and easy to make a DIY paper lantern.']`
glm	single	`test_model_9b_eager`	output_mismatch	7/7	`(line 133) AssertionError: Lists differ: ['Hel[140 chars]ou how to make a simple and easy to make a DIY paper flower.'] != ['Hel[140 chars]ou how to make a simple and easy to make a DIY paper lantern.']`
glm_image	multi	`test_image_to_image_generation`	output_mismatch	7/7	`(line 687) AssertionError: False is not true : Expected first 30 tokens:`
glm_image	single	`test_image_to_image_generation`	output_mismatch	7/7	`(line 687) AssertionError: False is not true : Expected first 30 tokens:`
glm_ocr	multi	`test_small_model_integration_test_batch`	output_mismatch	7/7	`(line 503) AssertionError: Lists differ: ['\n<\|image\|><\|image\|><\|image\|><\|image\|><\|[14885 chars]ia.'] != ["\nWhat kind of dog is this?\n<think>Got [256 chars]t's"]`
glm_ocr	multi	`test_small_model_integration_test_batch_different_resolutions`	output_mismatch	7/7	`(line 631) AssertionError: Lists differ: ['\n<\|image\|><\|image\|><\|image\|><\|image\|><\|[10983 chars]at.'] != ["\nWhat kind of dog is this?\n<think>Got [258 chars]but"]`
glm_ocr	multi	`test_small_model_integration_test_batch_wo_image`	output_mismatch	7/7	`(line 603) AssertionError: Lists differ: ['\n<\|image\|><\|image\|><\|image\|><\|image\|><\|[7469 chars]Ai."] != ["\nWhat kind of dog is this?\n<think>Got [267 chars]ion']`
glm_ocr	multi	`test_small_model_integration_test_expand`	output_mismatch	7/7	`(line 575) AssertionError: Lists differ: ['\n<\|image\|><\|image\|><\|image\|><\|image\|><\|[14840 chars]d a'] != ["\nWhat kind of dog is this?\n<think>Got [267 chars]lly"]`
glm_ocr	multi	`test_small_model_integration_test_with_video`	output_mismatch	7/7	`(line 541) AssertionError: Lists differ: ['\n<\|begin_of_video\|><\|image\|><\|image\|><\|[50804 chars]rt.'] != ["\n012345Describe this video.\n<think>Got[114 chars]irt"]`
glm_ocr	single	`test_small_model_integration_test_batch`	output_mismatch	7/7	`(line 503) AssertionError: Lists differ: ['\n<\|image\|><\|image\|><\|image\|><\|image\|><\|[14885 chars]ia.'] != ["\nWhat kind of dog is this?\n<think>Got [256 chars]t's"]`
glm_ocr	single	`test_small_model_integration_test_batch_different_resolutions`	output_mismatch	7/7	`(line 631) AssertionError: Lists differ: ['\n<\|image\|><\|image\|><\|image\|><\|image\|><\|[10983 chars]at.'] != ["\nWhat kind of dog is this?\n<think>Got [258 chars]but"]`
glm_ocr	single	`test_small_model_integration_test_batch_wo_image`	output_mismatch	7/7	`(line 603) AssertionError: Lists differ: ['\n<\|image\|><\|image\|><\|image\|><\|image\|><\|[7469 chars]Ai."] != ["\nWhat kind of dog is this?\n<think>Got [267 chars]ion']`
glm_ocr	single	`test_small_model_integration_test_expand`	output_mismatch	7/7	`(line 575) AssertionError: Lists differ: ['\n<\|image\|><\|image\|><\|image\|><\|image\|><\|[14840 chars]d a'] != ["\nWhat kind of dog is this?\n<think>Got [267 chars]lly"]`
glm_ocr	single	`test_small_model_integration_test_with_video`	output_mismatch	7/7	`(line 541) AssertionError: Lists differ: ['\n<\|begin_of_video\|><\|image\|><\|image\|><\|[50804 chars]rt.'] != ["\n012345Describe this video.\n<think>Got[114 chars]irt"]`
got_ocr2	multi	`test_small_model_integration_test_got_ocr_format`	output_mismatch	7/7	`(line 210) AssertionError: 'R\\&D' != '\\title{\nR'`
got_ocr2	single	`test_small_model_integration_test_got_ocr_format`	output_mismatch	7/7	`(line 210) AssertionError: 'R\\&D' != '\\title{\nR'`
granite	multi	`test_model_3b_logits_bf16`	output_mismatch	7/7	`(line 687) AssertionError: False is not true`
granite	single	`test_model_3b_logits_bf16`	output_mismatch	7/7	`(line 687) AssertionError: False is not true`
grounding_dino	multi	`test_cross_attention_mask`	output_mismatch	7/7	`(line 787) AssertionError: Tensor-likes are not close!`
grounding_dino	multi	`test_grounding_dino_loss`	output_mismatch	7/7	`(line 869) AssertionError: Scalars are not close!`
grounding_dino	multi	`test_inference_object_detection_head`	output_mismatch	7/7	`(line 678) AssertionError: Tensor-likes are not close!`
grounding_dino	multi	`test_inference_object_detection_head_equivalence_cpu_accelerator`	output_mismatch	7/7	`(line 745) AssertionError: Tensor-likes are not close!`
grounding_dino	single	`test_cross_attention_mask`	output_mismatch	7/7	`(line 787) AssertionError: Tensor-likes are not close!`
grounding_dino	single	`test_grounding_dino_loss`	output_mismatch	7/7	`(line 869) AssertionError: Scalars are not close!`
grounding_dino	single	`test_inference_object_detection_head`	output_mismatch	7/7	`(line 678) AssertionError: Tensor-likes are not close!`
grounding_dino	single	`test_inference_object_detection_head_equivalence_cpu_accelerator`	output_mismatch	7/7	`(line 745) AssertionError: Tensor-likes are not close!`
helium	multi	`test_model_2b`	output_mismatch	7/7	`(line 73) AssertionError: Lists differ: ['Hel[51 chars]have been working on a new project for a while now and I have'] != ['Hel[51 chars]have been working on a new project for a while now, and I']`
helium	single	`test_model_2b`	output_mismatch	7/7	`(line 73) AssertionError: Lists differ: ['Hel[51 chars]have been working on a new project for a while now and I have'] != ['Hel[51 chars]have been working on a new project for a while now, and I']`
hiera	multi	`test_inference_image_classification_head`	output_mismatch	7/7	`(line 560) AssertionError: Tensor-likes are not close!`
hiera	single	`test_inference_image_classification_head`	output_mismatch	7/7	`(line 560) AssertionError: Tensor-likes are not close!`
higgs_audio_v2	multi	`test_batched_inference`	output_mismatch	7/7	`(line 1399) AssertionError: Tensor-likes are not equal!`
higgs_audio_v2	multi	`test_multi_speaker_smart_voice`	output_mismatch	7/7	`(line 758) AssertionError: Tensor-likes are not equal!`
higgs_audio_v2	multi	`test_multi_speaker_voice_cloning`	output_mismatch	7/7	`(line 1098) AssertionError: Tensor-likes are not equal!`
higgs_audio_v2	multi	`test_zero_shot_voice_cloning`	output_mismatch	7/7	`(line 931) AssertionError: Tensor-likes are not equal!`
higgs_audio_v2	single	`test_batched_inference`	output_mismatch	7/7	`(line 1399) AssertionError: Tensor-likes are not equal!`
higgs_audio_v2	single	`test_multi_speaker_smart_voice`	output_mismatch	7/7	`(line 758) AssertionError: Tensor-likes are not equal!`
higgs_audio_v2	single	`test_multi_speaker_voice_cloning`	output_mismatch	7/7	`(line 1098) AssertionError: Tensor-likes are not equal!`
higgs_audio_v2	single	`test_zero_shot_voice_cloning`	output_mismatch	7/7	`(line 931) AssertionError: Tensor-likes are not equal!`
instructblip	multi	`test_inference_flant5_xl`	output_mismatch	7/7	`(line 718) AssertionError: Lists differ: [0, 3[68 chars]459, 9256, 16, 8, 2214, 13, 3, 9, 3164, 690, 2[500 chars]5, 1] != [0, 3[68 chars]459, 4049, 16, 8, 2214, 13, 3, 9, 3164, 690, 2[295 chars]5, 1]`
instructblip	single	`test_inference_flant5_xl`	output_mismatch	7/7	`(line 718) AssertionError: Lists differ: [0, 3[68 chars]459, 9256, 16, 8, 2214, 13, 3, 9, 3164, 690, 2[500 chars]5, 1] != [0, 3[68 chars]459, 4049, 16, 8, 2214, 13, 3, 9, 3164, 690, 2[295 chars]5, 1]`
instructblipvideo	multi	`test_inference_vicuna_7b`	output_mismatch	7/7	`(line 671) AssertionError: 'Expl[43 chars]a baby girl wearing glasses is reading a book on the bed 1' != 'Expl[43 chars]a baby girl wearing glasses is reading a book on the bed 1080p'`
instructblipvideo	single	`test_inference_vicuna_7b`	output_mismatch	7/7	`(line 671) AssertionError: 'Expl[43 chars]a baby girl wearing glasses is reading a book on the bed 1' != 'Expl[43 chars]a baby girl wearing glasses is reading a book on the bed 1080p'`
internvl	multi	`test_llama_small_model_integration_forward`	output_mismatch	7/7	`(line 687) AssertionError: False is not true : Actual logits: tensor([ -9.8750, -0.4954, 1.4580, -10.3281, -10.3359], dtype=torch.float16)`
internvl	multi	`test_llama_small_model_integration_generate_text_only`	output_mismatch	7/7	`(line 714) AssertionError: "Autu[14 chars],\nNature's breath, a season's sigh,\nSilent woods awake." != "Autu[14 chars],\nNature's breath, a silent sigh,\nWinter's chill approaches."`
internvl	single	`test_llama_small_model_integration_forward`	output_mismatch	7/7	`(line 687) AssertionError: False is not true : Actual logits: tensor([ -9.8750, -0.4954, 1.4580, -10.3281, -10.3359], dtype=torch.float16)`
internvl	single	`test_llama_small_model_integration_generate_text_only`	output_mismatch	7/7	`(line 714) AssertionError: "Autu[14 chars],\nNature's breath, a season's sigh,\nSilent woods awake." != "Autu[14 chars],\nNature's breath, a silent sigh,\nWinter's chill approaches."`
jamba	multi	`test_simple_batched_generate_with_padding`	output_mismatch	7/7	`(line 576) AssertionError: "<\|startoftext\|>Tell me a story<\|pad\|><\|p[50 chars]t I'" != '<\|pad\|><\|pad\|><\|pad\|><\|pad\|><\|pad\|><\|pad[76 chars]ates'`
jamba	single	`test_simple_batched_generate_with_padding`	output_mismatch	7/7	`(line 576) AssertionError: "<\|startoftext\|>Tell me a story<\|pad\|><\|p[50 chars]t I'" != '<\|pad\|><\|pad\|><\|pad\|><\|pad\|><\|pad\|><\|pad[76 chars]ates'`
kosmos2	multi	`test_snowman_image_captioning`	output_mismatch	7/7	`(line 79) AssertionError:`
kosmos2	multi	`test_snowman_image_captioning_batch`	output_mismatch	7/7	`(line 712) AssertionError: Lists differ: ['<gr[35 chars]ail: A snowman is sitting in front of a fire, [575 chars]t>.'] != ['<gr[35 chars]ail: The image features a snowman sitting by<p[836 chars]t>.']`
kosmos2	single	`test_snowman_image_captioning`	output_mismatch	7/7	`(line 79) AssertionError:`
kosmos2	single	`test_snowman_image_captioning_batch`	output_mismatch	7/7	`(line 712) AssertionError: Lists differ: ['<gr[35 chars]ail: A snowman is sitting in front of a fire, [575 chars]t>.'] != ['<gr[35 chars]ail: The image features a snowman sitting by<p[836 chars]t>.']`
kosmos2_5	multi	`test_eager`	output_mismatch	7/7	`(line 578) AssertionError: Lists differ: ['<bb[216 chars]<y_650></bbox>COOKIE DOH SAUCES\n<bbox><x_788>[452 chars]0\n'] != ['<bb[216 chars]<y_651></bbox>COOKIE DOH SAUCES\n<bbox><x_788>[452 chars]0\n']`
kosmos2_5	single	`test_eager`	output_mismatch	7/7	`(line 578) AssertionError: Lists differ: ['<bb[216 chars]<y_650></bbox>COOKIE DOH SAUCES\n<bbox><x_788>[452 chars]0\n'] != ['<bb[216 chars]<y_651></bbox>COOKIE DOH SAUCES\n<bbox><x_788>[452 chars]0\n']`
layoutlmv2	multi	`test_processor_case_1`	output_mismatch	7/7	`(line 675) AssertionError: Sequences differ: "[CLS[522 chars]t itc ' s new fmcg businesses are the fastest [829 chars]PAD]" != "[CLS[522 chars]t itc's new fmcg businesses are the fastest gr[827 chars]PAD]"`
layoutlmv2	multi	`test_processor_case_4`	output_mismatch	7/7	`(line 675) AssertionError: Sequences differ: "[CLS] what ' s his name? [SEP] 11 : 14 to 11 : 39 a[1108 chars]SEP]" != "[CLS] what's his name? [SEP] 11 : 14 to 11 : 39 a. [1106 chars]SEP]"`
layoutlmv2	multi	`test_processor_case_5`	output_mismatch	7/7	`(line 675) AssertionError: Sequences differ: "[CLS] what ' s his name? [SEP] hello world [SEP]" != "[CLS] what's his name? [SEP] hello world [SEP]"`
layoutlmv2	single	`test_processor_case_1`	output_mismatch	7/7	`(line 675) AssertionError: Sequences differ: "[CLS[522 chars]t itc ' s new fmcg businesses are the fastest [829 chars]PAD]" != "[CLS[522 chars]t itc's new fmcg businesses are the fastest gr[827 chars]PAD]"`
layoutlmv2	single	`test_processor_case_4`	output_mismatch	7/7	`(line 675) AssertionError: Sequences differ: "[CLS] what ' s his name? [SEP] 11 : 14 to 11 : 39 a[1108 chars]SEP]" != "[CLS] what's his name? [SEP] 11 : 14 to 11 : 39 a. [1106 chars]SEP]"`
layoutlmv2	single	`test_processor_case_5`	output_mismatch	7/7	`(line 675) AssertionError: Sequences differ: "[CLS] what ' s his name? [SEP] hello world [SEP]" != "[CLS] what's his name? [SEP] hello world [SEP]"`
lfm2_moe	multi	`test_model_1a8b_batched_chat_generation`	output_mismatch	7/7	`(line 223) AssertionError: Lists differ: ['Who are you? (AI) designed to assist? \nI am an AI ass[192 chars]ial'] != ['Who are you? (as AI) created by? \nI am an artificial [200 chars]ish']`
lfm2_moe	single	`test_model_1a8b_batched_chat_generation`	output_mismatch	7/7	`(line 223) AssertionError: Lists differ: ['Who are you? (AI) designed to assist? \nI am an AI ass[192 chars]ial'] != ['Who are you? (as AI) created by? \nI am an artificial [200 chars]ish']`
lfm2_vl	multi	`test_integration_test`	output_mismatch	7/7	`(line 246) AssertionError: 'In t[53 chars]. They are both very relaxed and comfortable. [14 chars]grey' != 'In t[53 chars]. There are also two remote controls on the blanket.\n\n\n\n'`
lfm2_vl	multi	`test_integration_test_high_resolution`	output_mismatch	7/7	`(line 354) AssertionError: 'In t[52 chars]ymbol of freedom and democracy. It stands tall on a small' != 'In t[52 chars]ymbol of freedom and democracy. It stands on Liberty Island in'`
lfm2_vl	single	`test_integration_test`	output_mismatch	7/7	`(line 246) AssertionError: 'In t[53 chars]. They are both very relaxed and comfortable. [14 chars]grey' != 'In t[53 chars]. There are also two remote controls on the blanket.\n\n\n\n'`
lfm2_vl	single	`test_integration_test_high_resolution`	output_mismatch	7/7	`(line 354) AssertionError: 'In t[52 chars]ymbol of freedom and democracy. It stands tall on a small' != 'In t[52 chars]ymbol of freedom and democracy. It stands on Liberty Island in'`
llama	multi	`test_llama_3_1_hard`	output_mismatch	7/7	`(line 96) AssertionError: 'Tell[74 chars]ical social and political upheaval in France t[552 chars] the' != 'Tell[74 chars]ical political and social upheaval in France t[558 chars]nshr'`
llama	single	`test_llama_3_1_hard`	output_mismatch	7/7	`(line 96) AssertionError: 'Tell[74 chars]ical social and political upheaval in France t[552 chars] the' != 'Tell[74 chars]ical political and social upheaval in France t[558 chars]nshr'`
llava	multi	`test_batched_generation`	output_mismatch	7/7	`(line 566) AssertionError: Lists differ: ["\n [134 chars] one image and a", '\nUSER: Describe the image[210 chars]ama'] != ["\n [134 chars] one and a yellow", '\nUSER: Describe the imag[211 chars]ama']`
llava	multi	`test_pixtral_batched`	output_mismatch	7/7	`(line 724) AssertionError: Lists differ: ['Wha[97 chars]mage?A narrow dirt path is surrounded by grass[74 chars]ue.'] != ['Wha[97 chars]mage?The image depicts a narrow, winding dirt [175 chars]ere']`
llava	single	`test_batched_generation`	output_mismatch	7/7	`(line 566) AssertionError: Lists differ: ["\n [134 chars] one image and a", '\nUSER: Describe the image[210 chars]ama'] != ["\n [134 chars] one and a yellow", '\nUSER: Describe the imag[211 chars]ama']`
llava	single	`test_pixtral_batched`	output_mismatch	7/7	`(line 724) AssertionError: Lists differ: ['Wha[97 chars]mage?A narrow dirt path is surrounded by grass[74 chars]ue.'] != ['Wha[97 chars]mage?The image depicts a narrow, winding dirt [175 chars]ere']`
llava_next	multi	`test_small_model_integration_test`	output_mismatch	7/7	`(line 172) AssertionError: assert False`
llava_next	single	`test_small_model_integration_test`	output_mismatch	7/7	`(line 172) AssertionError: assert False`
llava_next_video	multi	`test_small_model_integration_test`	output_mismatch	7/7	`(line 388) AssertionError: 'USER[154 chars]hile wearing a pair of glasses that are too la[24 chars] are' != 'USER[154 chars]hile another child is attempting to read the s[45 chars]eems'`
llava_next_video	multi	`test_small_model_integration_test_batch_matches_single`	output_mismatch	7/7	`(line 480) AssertionError: 'USER[154 chars]hile another child is attempting to read the s[96 chars]e to' != 'USER[154 chars]hile wearing a pair of glasses that are too la[69 chars]g it'`
llava_next_video	single	`test_small_model_integration_test`	output_mismatch	7/7	`(line 388) AssertionError: 'USER[154 chars]hile wearing a pair of glasses that are too la[24 chars] are' != 'USER[154 chars]hile another child is attempting to read the s[45 chars]eems'`
llava_next_video	single	`test_small_model_integration_test_batch_matches_single`	output_mismatch	7/7	`(line 480) AssertionError: 'USER[154 chars]hile another child is attempting to read the s[96 chars]e to' != 'USER[154 chars]hile wearing a pair of glasses that are too la[69 chars]g it'`
longt5	multi	`test_inference_hidden_states`	output_mismatch	7/7	`(line 1225) AssertionError: Tensor-likes are not close!`
longt5	multi	`test_summarization`	output_mismatch	7/7	`(line 1194) AssertionError: Lists differ: ['background : coronary artery disease ( ca[601 chars]red'] != ['sss thessass:ss andss toss ofss fillssess[171 chars]se,']`
longt5	single	`test_inference_hidden_states`	output_mismatch	7/7	`(line 1225) AssertionError: Tensor-likes are not close!`
longt5	single	`test_summarization`	output_mismatch	7/7	`(line 1194) AssertionError: Lists differ: ['background : coronary artery disease ( ca[601 chars]red'] != ['sss thessass:ss andss toss ofss fillssess[171 chars]se,']`
luke	multi	`test_inference_base_model`	output_mismatch	7/7	`(line 905) AssertionError: Tensor-likes are not close!`
luke	multi	`test_inference_large_model`	output_mismatch	7/7	`(line 940) AssertionError: Tensor-likes are not close!`
luke	single	`test_inference_base_model`	output_mismatch	7/7	`(line 905) AssertionError: Tensor-likes are not close!`
luke	single	`test_inference_large_model`	output_mismatch	7/7	`(line 940) AssertionError: Tensor-likes are not close!`
lw_detr	multi	`test_inference_object_detection_head_tiny`	output_mismatch	7/7	`(line 690) AssertionError: Tensor-likes are not close!`
lw_detr	multi	`test_inference_object_detection_head_xlarge`	output_mismatch	7/7	`(line 766) AssertionError: Tensor-likes are not close!`
lw_detr	single	`test_inference_object_detection_head_tiny`	output_mismatch	7/7	`(line 690) AssertionError: Tensor-likes are not close!`
lw_detr	single	`test_inference_object_detection_head_xlarge`	output_mismatch	7/7	`(line 766) AssertionError: Tensor-likes are not close!`
m2m_100	multi	`test_seq_to_seq_generation`	output_mismatch	7/7	`(line 397) AssertionError: assert ['</s>__en__T... France.</s>'] == ['</s> __en__... France.</s>']`
m2m_100	single	`test_seq_to_seq_generation`	output_mismatch	7/7	`(line 397) AssertionError: assert ['</s>__en__T... France.</s>'] == ['</s> __en__... France.</s>']`
mimi	multi	`test_integration`	output_mismatch	7/7	`(line 687) AssertionError: False is not true`
mimi	multi	`test_integration_longform`	output_mismatch	7/7	`(line 687) AssertionError: False is not true`
mimi	single	`test_integration`	output_mismatch	7/7	`(line 687) AssertionError: False is not true`
mimi	single	`test_integration_longform`	output_mismatch	7/7	`(line 687) AssertionError: False is not true`
minimax	multi	`test_small_model_logits`	output_mismatch	7/7	`(line 233) AssertionError: Tensor-likes are not close!`
minimax	single	`test_small_model_logits`	output_mismatch	7/7	`(line 233) AssertionError: Tensor-likes are not close!`
ministral	multi	`test_model_8b_generation`	output_mismatch	7/7	`(line 116) AssertionError: 'My favourite condiment is 100% natural, 100% organic, 100% free of' != 'MyfavouritecondimentisĊĠĠĠĠJoined:Ġ2018-01-01,Ġ12'`
ministral	multi	`test_model_8b_logits`	output_mismatch	7/7	`(line 93) AssertionError: Tensor-likes are not close!`
ministral	single	`test_model_8b_generation`	output_mismatch	7/7	`(line 116) AssertionError: 'My favourite condiment is 100% natural, 100% organic, 100% free of' != 'MyfavouritecondimentisĊĠĠĠĠJoined:Ġ2018-01-01,Ġ12'`
ministral	single	`test_model_8b_logits`	output_mismatch	7/7	`(line 93) AssertionError: Tensor-likes are not close!`
ministral3	multi	`test_model_3b_generation`	output_mismatch	7/7	`(line 130) AssertionError: 'My favourite condiment is icing sugar. I[47 chars]fles' != "My favourite condiment is 100% pure oliv[46 chars]t in"`
ministral3	multi	`test_model_3b_logits`	output_mismatch	7/7	`(line 102) AssertionError: Tensor-likes are not close!`
ministral3	single	`test_model_3b_generation`	output_mismatch	7/7	`(line 130) AssertionError: 'My favourite condiment is icing sugar. I[47 chars]fles' != "My favourite condiment is 100% pure oliv[46 chars]t in"`
ministral3	single	`test_model_3b_logits`	output_mismatch	7/7	`(line 102) AssertionError: Tensor-likes are not close!`
mistral	multi	`test_model_7b_logits`	output_mismatch	7/7	`(line 112) AssertionError: Tensor-likes are not close!`
mistral	multi	`test_speculative_generation`	output_mismatch	7/7	`(line 207) AssertionError: 'My f[18 chars] is 100% ketchup. I’m not a fan of mustard, relish' != 'My f[18 chars] is 100% mayonnaise. I’m not a fan of the fancy stuff with all'`
mistral	single	`test_model_7b_logits`	output_mismatch	7/7	`(line 112) AssertionError: Tensor-likes are not close!`
mistral	single	`test_speculative_generation`	output_mismatch	7/7	`(line 207) AssertionError: 'My f[18 chars] is 100% ketchup. I’m not a fan of mustard, relish' != 'My f[18 chars] is 100% mayonnaise. I’m not a fan of the fancy stuff with all'`
mistral3	multi	`test_mistral3_integration_batched_generate`	output_mismatch	7/7	`(line 362) AssertionError: ' to write a short story based on this ima[70 chars]e pl' != 'Calm waters reflect\nWooden path to dista[26 chars]oods'`
mistral3	multi	`test_mistral3_integration_batched_generate_multi_image`	output_mismatch	7/7	`(line 438) AssertionError: ' to write a short story based on this im[81 chars]ched' != "Calm waters reflect\nWooden path to dist[29 chars]hold"`
mistral3	multi	`test_mistral3_integration_generate`	output_mismatch	7/7	`(line 309) AssertionError: 'The [14 chars] two tabby cats lying on a pink surface, which[21 chars]h or' != 'The [14 chars] two cats lying on a pink surface, which appea[21 chars] bed'`
mistral3	single	`test_mistral3_integration_batched_generate`	output_mismatch	7/7	`(line 362) AssertionError: ' to write a short story based on this ima[70 chars]e pl' != 'Calm waters reflect\nWooden path to dista[26 chars]oods'`
mistral3	single	`test_mistral3_integration_batched_generate_multi_image`	output_mismatch	7/7	`(line 438) AssertionError: ' to write a short story based on this im[81 chars]ched' != "Calm waters reflect\nWooden path to dist[29 chars]hold"`
mistral3	single	`test_mistral3_integration_generate`	output_mismatch	7/7	`(line 309) AssertionError: 'The [14 chars] two tabby cats lying on a pink surface, which[21 chars]h or' != 'The [14 chars] two cats lying on a pink surface, which appea[21 chars] bed'`
mixtral	multi	`test_small_model_logits`	output_mismatch	7/7	`(line 143) AssertionError: Tensor-likes are not close!`
mixtral	multi	`test_small_model_logits_batched`	output_mismatch	7/7	`(line 188) AssertionError: Tensor-likes are not close!`
mixtral	single	`test_small_model_logits`	output_mismatch	7/7	`(line 143) AssertionError: Tensor-likes are not close!`
mixtral	single	`test_small_model_logits_batched`	output_mismatch	7/7	`(line 188) AssertionError: Tensor-likes are not close!`
mllama	multi	`test_11b_model_integration_batched_generate`	output_mismatch	7/7	`(line 643) AssertionError: 'If I[43 chars]d be: "I\'m not a fan of long exposure, but I\[21 chars]".\\' != 'If I[43 chars]d be:.\\nA dock in the lake.\\nA mountain in t[27 chars]ure.'`
mllama	multi	`test_11b_model_integration_forward`	output_mismatch	7/7	`(line 687) AssertionError: False is not true : Actual logits: tensor([ 6.5938, 4.4062, 3.0938, -0.3105, 1.8906], dtype=torch.bfloat16)`
mllama	multi	`test_11b_model_integration_generate`	output_mismatch	7/7	`(line 510) AssertionError: 'If I[43 chars]d be: "I\'m not a fan of long exposure, but I\[21 chars]".\\' != 'If I[43 chars]d be:.\\nA dock in the lake.\\nA mountain in t[27 chars]ure.'`
mllama	multi	`test_11b_model_integration_multi_image_generate`	output_mismatch	7/7	`(line 724) AssertionError: 'The image shows a red octagonal stop sign w[59 chars]to a' != 'This image shows a long wooden dock extendi[67 chars]ling'`
mllama	single	`test_11b_model_integration_batched_generate`	output_mismatch	7/7	`(line 643) AssertionError: 'If I[43 chars]d be: "I\'m not a fan of long exposure, but I\[21 chars]".\\' != 'If I[43 chars]d be:.\\nA dock in the lake.\\nA mountain in t[27 chars]ure.'`
mllama	single	`test_11b_model_integration_forward`	output_mismatch	7/7	`(line 687) AssertionError: False is not true : Actual logits: tensor([ 6.5938, 4.4062, 3.0938, -0.3105, 1.8906], dtype=torch.bfloat16)`
mllama	single	`test_11b_model_integration_generate`	output_mismatch	7/7	`(line 510) AssertionError: 'If I[43 chars]d be: "I\'m not a fan of long exposure, but I\[21 chars]".\\' != 'If I[43 chars]d be:.\\nA dock in the lake.\\nA mountain in t[27 chars]ure.'`
mllama	single	`test_11b_model_integration_multi_image_generate`	output_mismatch	7/7	`(line 724) AssertionError: 'The image shows a red octagonal stop sign w[59 chars]to a' != 'This image shows a long wooden dock extendi[67 chars]ling'`
mluke	multi	`test_entity_classification_no_padding_or_truncation`	output_mismatch	7/7	`(line 453) AssertionError: '<s> Japanese is an<s> East Asian language<s> spoken by about[40 chars]</s>' != '<s> Japanese is an<ent>East Asian language<ent>spoken by abo[42 chars]</s>'`
mluke	multi	`test_entity_pair_classification_no_padding_or_truncation`	output_mismatch	7/7	`(line 507) AssertionError: '<s><s> Japanese<s> is an East Asian language [64 chars]</s>' != '<s><ent>Japanese<ent>is an East Asian languag[68 chars]</s>'`
mluke	multi	`test_entity_span_classification_no_padding_or_truncation`	output_mismatch	7/7	`(line 572) AssertionError: '<s> [33 chars]e spoken by about 128 million people, primarily in Japan .</s>' != '<s> [33 chars]e spoken by about 128 million people, primarily in Japan.</s>'`
mluke	single	`test_entity_classification_no_padding_or_truncation`	output_mismatch	7/7	`(line 453) AssertionError: '<s> Japanese is an<s> East Asian language<s> spoken by about[40 chars]</s>' != '<s> Japanese is an<ent>East Asian language<ent>spoken by abo[42 chars]</s>'`
mluke	single	`test_entity_pair_classification_no_padding_or_truncation`	output_mismatch	7/7	`(line 507) AssertionError: '<s><s> Japanese<s> is an East Asian language [64 chars]</s>' != '<s><ent>Japanese<ent>is an East Asian languag[68 chars]</s>'`
mluke	single	`test_entity_span_classification_no_padding_or_truncation`	output_mismatch	7/7	`(line 572) AssertionError: '<s> [33 chars]e spoken by about 128 million people, primarily in Japan .</s>' != '<s> [33 chars]e spoken by about 128 million people, primarily in Japan.</s>'`
mm_grounding_dino	multi	`test_inference_object_detection_head`	output_mismatch	7/7	`(line 672) AssertionError: Tensor-likes are not close!`
mm_grounding_dino	multi	`test_inference_object_detection_head_equivalence_cpu_gpu`	output_mismatch	7/7	`(line 738) AssertionError: Tensor-likes are not close!`
mm_grounding_dino	multi	`test_mm_grounding_dino_loss`	output_mismatch	7/7	`(line 687) AssertionError: False is not true`
mm_grounding_dino	single	`test_inference_object_detection_head`	output_mismatch	7/7	`(line 672) AssertionError: Tensor-likes are not close!`
mm_grounding_dino	single	`test_inference_object_detection_head_equivalence_cpu_gpu`	output_mismatch	7/7	`(line 738) AssertionError: Tensor-likes are not close!`
mm_grounding_dino	single	`test_mm_grounding_dino_loss`	output_mismatch	7/7	`(line 687) AssertionError: False is not true`
moonshine_streaming	multi	`test_medium_logits_batch`	output_mismatch	7/7	`(line 605) AssertionError: Tensor-likes are not close!`
moonshine_streaming	multi	`test_small_logits_batch`	output_mismatch	7/7	`(line 572) AssertionError: Tensor-likes are not close!`
moonshine_streaming	single	`test_medium_logits_batch`	output_mismatch	7/7	`(line 605) AssertionError: Tensor-likes are not close!`
moonshine_streaming	single	`test_small_logits_batch`	output_mismatch	7/7	`(line 572) AssertionError: Tensor-likes are not close!`
moshi	multi	`test_moshika_greedy_unconditional_fp16`	output_mismatch	7/7	`(line 687) AssertionError: False is not true`
moshi	multi	`test_moshiko_greedy_unconditional_fp16`	output_mismatch	7/7	`(line 687) AssertionError: False is not true`
moshi	multi	`test_moshiko_greedy_unconditional_fp16_eager`	output_mismatch	7/7	`(line 687) AssertionError: False is not true`
moshi	single	`test_moshika_greedy_unconditional_fp16`	output_mismatch	7/7	`(line 687) AssertionError: False is not true`
moshi	single	`test_moshiko_greedy_unconditional_fp16`	output_mismatch	7/7	`(line 687) AssertionError: False is not true`
moshi	single	`test_moshiko_greedy_unconditional_fp16_eager`	output_mismatch	7/7	`(line 687) AssertionError: False is not true`
moshi	single	`test_moshiko_greedy_unconditional_fp32`	output_mismatch	7/7	`(line 687) AssertionError: False is not true`
musicgen	multi	`test_generate_text_prompt_sampling`	output_mismatch	7/7	`(line 1262) AssertionError: Tensor-likes are not close!`
musicgen	multi	`test_generate_unconditional_sampling`	output_mismatch	7/7	`(line 1179) AssertionError: Tensor-likes are not close!`
musicgen	single	`test_generate_text_prompt_sampling`	output_mismatch	7/7	`(line 1262) AssertionError: Tensor-likes are not close!`
musicgen	single	`test_generate_unconditional_sampling`	output_mismatch	7/7	`(line 1179) AssertionError: Tensor-likes are not close!`
musicgen_melody	multi	`test_generate_text_audio_prompt`	output_mismatch	6/7	`(line 1307) AssertionError: Tensor-likes are not close!`
musicgen_melody	multi	`test_generate_text_prompt_greedy`	output_mismatch	6/7	`(line 1219) AssertionError: Tensor-likes are not close!`
musicgen_melody	multi	`test_generate_text_prompt_greedy_with_classifier_free_guidance`	output_mismatch	6/7	`(line 1247) AssertionError: Tensor-likes are not close!`
musicgen_melody	multi	`test_generate_text_prompt_sampling`	output_mismatch	6/7	`(line 1282) AssertionError: Tensor-likes are not close!`
musicgen_melody	multi	`test_generate_unconditional_greedy`	output_mismatch	6/7	`(line 1167) AssertionError: Tensor-likes are not close!`
musicgen_melody	multi	`test_generate_unconditional_sampling`	output_mismatch	6/7	`(line 1192) AssertionError: Tensor-likes are not close!`
musicgen_melody	multi	`test_generate_text_audio_prompt`	output_mismatch	6/7	`(line 1376) AssertionError: Tensor-likes are not close!`
musicgen_melody	multi	`test_generate_unconditional_greedy`	output_mismatch	6/7	`(line 1344) AssertionError: Tensor-likes are not close!`
musicgen_melody	single	`test_generate_text_audio_prompt`	output_mismatch	7/7	`(line 1307) AssertionError: Tensor-likes are not close!`
musicgen_melody	single	`test_generate_text_prompt_greedy`	output_mismatch	7/7	`(line 1219) AssertionError: Tensor-likes are not close!`
musicgen_melody	single	`test_generate_text_prompt_greedy_with_classifier_free_guidance`	output_mismatch	7/7	`(line 1247) AssertionError: Tensor-likes are not close!`
musicgen_melody	single	`test_generate_text_prompt_sampling`	output_mismatch	7/7	`(line 1282) AssertionError: Tensor-likes are not close!`
musicgen_melody	single	`test_generate_unconditional_greedy`	output_mismatch	7/7	`(line 1167) AssertionError: Tensor-likes are not close!`
musicgen_melody	single	`test_generate_unconditional_sampling`	output_mismatch	7/7	`(line 1192) AssertionError: Tensor-likes are not close!`
musicgen_melody	single	`test_generate_text_audio_prompt`	output_mismatch	7/7	`(line 1376) AssertionError: Tensor-likes are not close!`
musicgen_melody	single	`test_generate_unconditional_greedy`	output_mismatch	7/7	`(line 1344) AssertionError: Tensor-likes are not close!`
nemotron	multi	`test_nemotron_8b_generation_eager`	output_mismatch	6/7	`(line 103) AssertionError: Lists differ: ['Wha[46 chars]er: Jupiter\n\nWhat is the answer'] != ['Wha[46 chars]er: Jupiter\n\nWhat is the answer: What is the name of the 19']`
nemotron	single	`test_nemotron_8b_generation_eager`	output_mismatch	7/7	`(line 103) AssertionError: Lists differ: ['Wha[46 chars]er: Jupiter\n\nWhat is the answer'] != ['Wha[46 chars]er: Jupiter\n\nWhat is the answer: What is the name of the 19']`
nllb_moe	multi	`test_inference_logits`	output_mismatch	7/7	`(line 399) AssertionError: Tensor-likes are not close!`
nllb_moe	single	`test_inference_logits`	output_mismatch	7/7	`(line 399) AssertionError: Tensor-likes are not close!`
olmo	multi	`test_export_static_cache`	output_mismatch	6/7	`(line 338) AssertionError: Lists differ: ['Sim[41 chars]that \nthe speed of light is the same in all r[35 chars]ght'] != ['Sim[41 chars]that .1.\nThe theory of relativity states tha[18 chars] of']`
olmo	multi	`test_model_7b_greedy_generation`	output_mismatch	6/7	`(line 242) AssertionError: 'Simp[40 chars]that \nthe speed of light is the same for all [232 chars]\n\n' != 'Simp[40 chars]that .1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1[20 chars].1.1'`
olmo	single	`test_export_static_cache`	output_mismatch	6/7	`(line 338) AssertionError: Lists differ: ['Sim[41 chars]that \nthe speed of light is the same in all r[35 chars]ght'] != ['Sim[41 chars]that .1.\nThe theory of relativity states tha[18 chars] of']`
olmo	single	`test_model_7b_greedy_generation`	output_mismatch	6/7	`(line 242) AssertionError: 'Simp[40 chars]that \nthe speed of light is the same for all [232 chars]\n\n' != 'Simp[40 chars]that .1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1[20 chars].1.1'`
olmo2	multi	`test_model_1b_logits_bfloat16`	output_mismatch	7/7	`(line 214) AssertionError: Tensor-likes are not close!`
olmo2	single	`test_model_1b_logits_bfloat16`	output_mismatch	7/7	`(line 214) AssertionError: Tensor-likes are not close!`
olmo3	multi	`test_model_7b_logits`	output_mismatch	7/7	`(line 196) AssertionError: Tensor-likes are not close!`
olmo3	single	`test_model_7b_logits`	output_mismatch	7/7	`(line 196) AssertionError: Tensor-likes are not close!`
olmoe	multi	`test_model_7b_logits`	output_mismatch	7/7	`(line 217) AssertionError: Tensor-likes are not close!`
olmoe	single	`test_model_7b_logits`	output_mismatch	7/7	`(line 217) AssertionError: Tensor-likes are not close!`
oneformer	multi	`test_inference_no_head`	output_mismatch	6/7	`(line 507) AssertionError: Tensor-likes are not close!`
oneformer	multi	`test_inference_universal_segmentation_head`	output_mismatch	6/7	`(line 549) AssertionError: Tensor-likes are not close!`
oneformer	single	`test_inference_no_head`	output_mismatch	7/7	`(line 507) AssertionError: Tensor-likes are not close!`
oneformer	single	`test_inference_universal_segmentation_head`	output_mismatch	7/7	`(line 549) AssertionError: Tensor-likes are not close!`
opt	multi	`test_inference_no_head`	output_mismatch	7/7	`(line 357) AssertionError: tensor([[-0.2883, -1.9219, -0.3079],`
opt	single	`test_inference_no_head`	output_mismatch	7/7	`(line 357) AssertionError: tensor([[-0.2883, -1.9219, -0.3079],`
ovis2	multi	`test_small_model_integration_test_batch_different_resolutions`	output_mismatch	7/7	`(line 355) AssertionError: Lists differ: ['sys[81 chars]ant\n', 'system\nYou are a helpful assistant.\[139 chars]et.'] != ['sys[81 chars]ant\nAnswer: I see a brown dog standing on a w[224 chars]et.']`
ovis2	single	`test_small_model_integration_test_batch_different_resolutions`	output_mismatch	7/7	`(line 355) AssertionError: Lists differ: ['sys[81 chars]ant\n', 'system\nYou are a helpful assistant.\[139 chars]et.'] != ['sys[81 chars]ant\nAnswer: I see a brown dog standing on a w[224 chars]et.']`
owlvit	multi	`test_inference_interpolate_pos_encoding`	output_mismatch	7/7	`(line 683) AssertionError: Tensor-likes are not close!`
owlvit	multi	`test_inference_object_detection`	output_mismatch	7/7	`(line 800) AssertionError: Tensor-likes are not close!`
owlvit	multi	`test_inference_one_shot_object_detection`	output_mismatch	7/7	`(line 843) AssertionError: Tensor-likes are not close!`
owlvit	single	`test_inference_interpolate_pos_encoding`	output_mismatch	7/7	`(line 683) AssertionError: Tensor-likes are not close!`
owlvit	single	`test_inference_object_detection`	output_mismatch	7/7	`(line 800) AssertionError: Tensor-likes are not close!`
owlvit	single	`test_inference_one_shot_object_detection`	output_mismatch	7/7	`(line 843) AssertionError: Tensor-likes are not close!`
persimmon	multi	`test_model_8b_chat_greedy_generation`	output_mismatch	6/7	`(line 131) AssertionError: 'huma[58 chars]ept: The theory of relativity states that the [80 chars]ion.' != 'huma[58 chars]ept: the speed of light in a vacuum is the sam[33 chars]ence'`
persimmon	single	`test_model_8b_chat_greedy_generation`	output_mismatch	6/7	`(line 131) AssertionError: 'huma[58 chars]ept: The theory of relativity states that the [80 chars]ion.' != 'huma[58 chars]ept: the speed of light in a vacuum is the sam[33 chars]ence'`
persimmon	single	`test_model_8b_chat_logits`	output_mismatch	6/7	`(line 99) AssertionError: Tensor-likes are not close!`
pixio	multi	`test_inference_no_head`	output_mismatch	7/7	`(line 277) AssertionError: Tensor-likes are not close!`
pixio	single	`test_inference_no_head`	output_mismatch	7/7	`(line 277) AssertionError: Tensor-likes are not close!`
plbart	multi	`test_fill_mask`	output_mismatch	7/7	`(line 444) AssertionError: '0 0 the 0 the 0 the 0 the 0 the 0 the 0 the 0 the 0' != '0 0 the 0 the 0 the 0 the 0 the 0 the 0 the 0 the'`
plbart	multi	`test_java_cs_generate_batch`	output_mismatch	7/7	`(line 379) AssertionError: assert ['public int ...turn a * b '] == ['public int ...rn a b * c']`
plbart	multi	`test_java_cs_generate_one`	output_mismatch	7/7	`(line 370) AssertionError: 'public int maximum(int a, int b, int c){return Math.Max(' != 'public int maximum(int a, int b, int c){return Math.Max(a'`
plbart	single	`test_fill_mask`	output_mismatch	7/7	`(line 444) AssertionError: '0 0 the 0 the 0 the 0 the 0 the 0 the 0 the 0 the 0' != '0 0 the 0 the 0 the 0 the 0 the 0 the 0 the 0 the'`
plbart	single	`test_java_cs_generate_batch`	output_mismatch	7/7	`(line 379) AssertionError: assert ['public int ...turn a * b '] == ['public int ...rn a b * c']`
plbart	single	`test_java_cs_generate_one`	output_mismatch	7/7	`(line 370) AssertionError: 'public int maximum(int a, int b, int c){return Math.Max(' != 'public int maximum(int a, int b, int c){return Math.Max(a'`
pvt	multi	`test_inference_image_classification`	output_mismatch	7/7	`(line 257) AssertionError: Tensor-likes are not close!`
pvt	multi	`test_inference_model`	output_mismatch	7/7	`(line 284) AssertionError: Tensor-likes are not close!`
pvt	single	`test_inference_image_classification`	output_mismatch	7/7	`(line 257) AssertionError: Tensor-likes are not close!`
pvt	single	`test_inference_model`	output_mismatch	7/7	`(line 284) AssertionError: Tensor-likes are not close!`
pvt_v2	multi	`test_inference_image_classification`	output_mismatch	7/7	`(line 275) AssertionError: Tensor-likes are not close!`
pvt_v2	single	`test_inference_image_classification`	output_mismatch	7/7	`(line 275) AssertionError: Tensor-likes are not close!`
qwen2_5_omni	multi	`test_small_model_integration_test`	output_mismatch	7/7	`(line 692) AssertionError: "syst[108 chars]d is glass shattering, and the dog is a Labrador Retriever." != "syst[108 chars]d is a glass shattering. The dog in the pictur[22 chars]ver."`
qwen2_5_omni	multi	`test_small_model_integration_test_batch`	output_mismatch	7/7	`(line 734) AssertionError: Lists differ: ["sys[109 chars]d is glass shattering, and the dog is a Labrad[185 chars]er."] != ["sys[109 chars]d is a glass shattering. The dog in the pictur[211 chars]er."]`
qwen2_5_omni	single	`test_small_model_integration_test`	output_mismatch	7/7	`(line 692) AssertionError: "syst[108 chars]d is glass shattering, and the dog is a Labrador Retriever." != "syst[108 chars]d is a glass shattering. The dog in the pictur[22 chars]ver."`
qwen2_5_omni	single	`test_small_model_integration_test_batch`	output_mismatch	7/7	`(line 734) AssertionError: Lists differ: ["sys[109 chars]d is glass shattering, and the dog is a Labrad[185 chars]er."] != ["sys[109 chars]d is a glass shattering. The dog in the pictur[211 chars]er."]`
qwen2_5_vl	multi	`test_small_model_integration_test_batch_wo_image`	output_mismatch	7/7	`(line 611) AssertionError: Lists differ: ['sys[298 chars]en, a large language model created by Alibaba [84 chars]and'] != ['sys[298 chars]en, an AI language model created by Alibaba Cl[96 chars]on,']`
qwen2_5_vl	single	`test_small_model_integration_test_batch_wo_image`	output_mismatch	7/7	`(line 611) AssertionError: Lists differ: ['sys[298 chars]en, a large language model created by Alibaba [84 chars]and'] != ['sys[298 chars]en, an AI language model created by Alibaba Cl[96 chars]on,']`
qwen2_moe	multi	`test_model_a2_7b_logits`	output_mismatch	7/7	`(line 147) AssertionError: Tensor-likes are not close!`
qwen2_moe	single	`test_model_a2_7b_logits`	output_mismatch	7/7	`(line 147) AssertionError: Tensor-likes are not close!`
qwen3	multi	`test_model_600m_logits`	output_mismatch	7/7	`(line 92) AssertionError: Tensor-likes are not close!`
qwen3	multi	`test_speculative_generation`	output_mismatch	7/7	`(line 198) AssertionError: 'My f[22 chars]100% beef, 100% beef, 100% beef.' != 'My f[22 chars]100% vegetable oil. It has a rich, creamy text[19 chars]utty'`
qwen3	single	`test_model_600m_logits`	output_mismatch	7/7	`(line 92) AssertionError: Tensor-likes are not close!`
qwen3	single	`test_speculative_generation`	output_mismatch	7/7	`(line 198) AssertionError: 'My f[22 chars]100% beef, 100% beef, 100% beef.' != 'My f[22 chars]100% vegetable oil. It has a rich, creamy text[19 chars]utty'`
qwen3_5	multi	`test_model_video_generation`	output_mismatch	7/7	`(line 845) AssertionError: Lists differ: [248045, 846, 198, 27, 15, 13, 18, 6283, 29, 248053] != [248045, 846, 198, 248053, 27, 15, 13, 18, 6283, 29]`
qwen3_5	multi	`test_model_video_generation_batch`	output_mismatch	7/7	`(line 897) AssertionError: Lists differ: [248045, 846, 198, 27, 15, 13, 18, 6283, 29, 248053] != [248045, 846, 198, 248053, 27, 15, 13, 18, 6283, 29]`
qwen3_5	single	`test_model_video_generation`	output_mismatch	7/7	`(line 845) AssertionError: Lists differ: [248045, 846, 198, 27, 15, 13, 18, 6283, 29, 248053] != [248045, 846, 198, 248053, 27, 15, 13, 18, 6283, 29]`
qwen3_5	single	`test_model_video_generation_batch`	output_mismatch	7/7	`(line 897) AssertionError: Lists differ: [248045, 846, 198, 27, 15, 13, 18, 6283, 29, 248053] != [248045, 846, 198, 248053, 27, 15, 13, 18, 6283, 29]`
qwen3_omni_moe	single	`test_small_model_integration_test_batch`	output_mismatch	7/7	`(line 823) AssertionError: Lists differ: ["use[99 chars]ation, here is a breakdown of what you're hear[187 chars]n\n"] != ["use[99 chars]ation provided:\n\nThe sound you hear is the d[191 chars]hed"]`
qwen3_omni_moe	single	`test_small_model_integration_test_w_audio`	output_mismatch	7/7	`(line 911) AssertionError: 'syst[223 chars]derstand spoken content, and I can also make inferences about' != 'syst[223 chars]derstand spoken content, and I can also process and respond to'`
qwen3_vl_moe	single	`test_small_model_integration_test_batch`	output_mismatch	7/7	`(line 446) AssertionError: Lists differ: ["use[92 chars]'s a wild cat species native to the grasslands[182 chars]ons"] != ["use[92 chars]'s a small wild cat native to the grasslands a[178 chars]ons"]`
rag	single	`test_rag_sequence_generate_batch`	output_mismatch	7/7	`(line 948) AssertionError: Lists differ: [' michael gross', ' monday 17 , 2018', ' te[96 chars]ndo'] != [' albert einstein', ' june 22 , 2018', ' am[85 chars]' 8']`
rag	single	`test_rag_sequence_generate_batch_from_context_input_ids`	output_mismatch	7/7	`(line 1000) AssertionError: Lists differ: [' michael gross', ' monday 17 , 2018', ' te[96 chars]ndo'] != [' albert einstein', ' june 22 , 2018', ' am[85 chars]' 8']`
rag	single	`test_rag_sequence_generate_beam`	output_mismatch	7/7	`(line 892) AssertionError: '" in the United States. "People Need Love"[155 chars]hit.' != '"She\'s My Kind of Girl" was released thro[257 chars]nts.'`
rag	single	`test_rag_token_generate_beam`	output_mismatch	7/7	`(line 854) AssertionError: '"She[14 chars] Girl' != '"She[14 chars] Girl" was released through Epic Records in Ja[179 chars]ses"'`
recurrent_gemma	multi	`test_2b_generate`	output_mismatch	7/7	`(line 157) AssertionError: Lists differ: ['Hel[325 chars]oday the 19th of June 2019, I was in the offic[256 chars] to'] != ['Hel[325 chars]oday is a new app that allows you to make mone[256 chars]app']`
recurrent_gemma	multi	`test_2b_sample`	output_mismatch	7/7	`(line 195) AssertionError: Lists differ: ['Wha[24 chars]Deep Learning (or deep learning) is one of the[107 chars]ple'] != ['Wha[24 chars]Deep learning is the next frontier in computer[98 chars] is']`
recurrent_gemma	multi	`test_longer_than_window`	output_mismatch	7/7	`(line 243) AssertionError: Lists differ: [' Jean-Philippe Guillet said, "We have no[245 chars]eo.'] != [" Robin's comments follow claims by two m[249 chars]the"]`
recurrent_gemma	multi	`test_model_2b_8bit`	output_mismatch	7/7	`(line 222) AssertionError: Lists differ: ['Hel[26 chars] the effects of the environment on the human b[124 chars]aur"] != ['Hel[26 chars] the topic of "The impact of social media on t[102 chars] 3D"]`
recurrent_gemma	single	`test_2b_generate`	output_mismatch	7/7	`(line 157) AssertionError: Lists differ: ['Hel[325 chars]oday the 19th of June 2019, I was in the offic[256 chars] to'] != ['Hel[325 chars]oday is a new app that allows you to make mone[256 chars]app']`
recurrent_gemma	single	`test_2b_sample`	output_mismatch	7/7	`(line 195) AssertionError: Lists differ: ['Wha[24 chars]Deep Learning (or deep learning) is one of the[107 chars]ple'] != ['Wha[24 chars]Deep learning is the next frontier in computer[98 chars] is']`
recurrent_gemma	single	`test_longer_than_window`	output_mismatch	7/7	`(line 243) AssertionError: Lists differ: [' Jean-Philippe Guillet said, "We have no[245 chars]eo.'] != [" Robin's comments follow claims by two m[249 chars]the"]`
recurrent_gemma	single	`test_model_2b_8bit`	output_mismatch	7/7	`(line 222) AssertionError: Lists differ: ['Hel[26 chars] the effects of the environment on the human b[124 chars]aur"] != ['Hel[26 chars] the topic of "The impact of social media on t[102 chars] 3D"]`
reformer	multi	`test_pretrained_generate_crime_and_punish`	output_mismatch	7/7	`(line 1370) AssertionError: 'A fe[36 chars]is ideas, so attentively two or three thousand roubles, and' != 'A fe[36 chars]is ideas, at the first entrance. He was positively for an inst'`
reformer	single	`test_pretrained_generate_crime_and_punish`	output_mismatch	7/7	`(line 1370) AssertionError: 'A fe[36 chars]is ideas, so attentively two or three thousand roubles, and' != 'A fe[36 chars]is ideas, at the first entrance. He was positively for an inst'`
regnet	multi	`test_inference_image_classification_head`	output_mismatch	7/7	`(line 243) AssertionError: Tensor-likes are not close!`
regnet	single	`test_inference_image_classification_head`	output_mismatch	7/7	`(line 243) AssertionError: Tensor-likes are not close!`
resnet	multi	`test_inference_image_classification_head`	output_mismatch	7/7	`(line 291) AssertionError: Tensor-likes are not close!`
resnet	single	`test_inference_image_classification_head`	output_mismatch	7/7	`(line 291) AssertionError: Tensor-likes are not close!`
seed_oss	multi	`test_model_36b_eager`	output_mismatch	7/7	`(line 95) AssertionError: Lists differ: ['How[132 chars]ing to use the ByteDance-Seed dataset for my research. I have'] != ['How[132 chars]ing to run the code on the <beginning of the code>seed']`
seed_oss	multi	`test_model_36b_sdpa`	output_mismatch	7/7	`(line 114) AssertionError: Lists differ: ['How[132 chars]ing to use the ByteDance-Seed dataset for my research. I have'] != ['How[132 chars]ing to run the code on the <beginning of the code>seed']`
seed_oss	single	`test_model_36b_eager`	output_mismatch	7/7	`(line 95) AssertionError: Lists differ: ['How[132 chars]ing to use the ByteDance-Seed dataset for my research. I have'] != ['How[132 chars]ing to run the code on the <beginning of the code>seed']`
seed_oss	single	`test_model_36b_sdpa`	output_mismatch	7/7	`(line 114) AssertionError: Lists differ: ['How[132 chars]ing to use the ByteDance-Seed dataset for my research. I have'] != ['How[132 chars]ing to run the code on the <beginning of the code>seed']`
smollm3	multi	`test_export_static_cache`	output_mismatch	7/7	`(line 198) AssertionError: 'Gravity is the force that pulls objects [69 chars] and' != ["Gravity is the force that pulls objects[85 chars] of"]`
smollm3	multi	`test_model_3b_logits`	output_mismatch	7/7	`(line 89) AssertionError: Tensor-likes are not close!`
smollm3	single	`test_export_static_cache`	output_mismatch	7/7	`(line 198) AssertionError: 'Gravity is the force that pulls objects [69 chars] and' != ["Gravity is the force that pulls objects[85 chars] of"]`
smollm3	single	`test_model_3b_logits`	output_mismatch	7/7	`(line 89) AssertionError: Tensor-likes are not close!`
stablelm	multi	`test_model_stablelm_3b_4e1t_logits`	output_mismatch	7/7	`(line 65) AssertionError: Tensor-likes are not close!`
stablelm	multi	`test_model_tiny_random_stablelm_2_logits`	output_mismatch	7/7	`(line 98) AssertionError: Tensor-likes are not close!`
stablelm	single	`test_model_stablelm_3b_4e1t_logits`	output_mismatch	7/7	`(line 65) AssertionError: Tensor-likes are not close!`
stablelm	single	`test_model_tiny_random_stablelm_2_logits`	output_mismatch	7/7	`(line 98) AssertionError: Tensor-likes are not close!`
starcoder2	multi	`test_starcoder2_batched_generation_4bit`	output_mismatch	7/7	`(line 152) AssertionError: Lists differ: ['Hel[188 chars]of', 'def hello_world():\n\treturn "Hello Worl[95 chars]ute'] != ['Hel[188 chars]of', "def hello_world(): hello_world():\n r[117 chars]'})"]`
starcoder2	multi	`test_starcoder2_batched_generation_eager`	output_mismatch	7/7	`(line 99) AssertionError: Lists differ: ['Hel[223 chars]ld():\n\treturn 'Hello World!'\n\n@app.route('[72 chars]app"] != ['Hel[223 chars]ld(): hello_world():\n return 'Hello World![87 chars]n\n"]`
starcoder2	multi	`test_starcoder2_batched_generation_sdpa`	output_mismatch	7/7	`(line 79) AssertionError: Lists differ: ['Hel[223 chars]ld():\n\treturn 'Hello World!'\n\n@app.route('[72 chars]app"] != ['Hel[223 chars]ld(): hello_world():\n return 'Hello World![87 chars]n\n"]`
starcoder2	single	`test_starcoder2_batched_generation_4bit`	output_mismatch	7/7	`(line 152) AssertionError: Lists differ: ['Hel[188 chars]of', 'def hello_world():\n\treturn "Hello Worl[95 chars]ute'] != ['Hel[188 chars]of', "def hello_world(): hello_world():\n r[117 chars]'})"]`
starcoder2	single	`test_starcoder2_batched_generation_eager`	output_mismatch	7/7	`(line 99) AssertionError: Lists differ: ['Hel[223 chars]ld():\n\treturn 'Hello World!'\n\n@app.route('[72 chars]app"] != ['Hel[223 chars]ld(): hello_world():\n return 'Hello World![87 chars]n\n"]`
starcoder2	single	`test_starcoder2_batched_generation_sdpa`	output_mismatch	7/7	`(line 79) AssertionError: Lists differ: ['Hel[223 chars]ld():\n\treturn 'Hello World!'\n\n@app.route('[72 chars]app"] != ['Hel[223 chars]ld(): hello_world():\n return 'Hello World![87 chars]n\n"]`
swiftformer	multi	`test_inference_image_classification_head`	output_mismatch	7/7	`(line 263) AssertionError: Tensor-likes are not close!`
swiftformer	single	`test_inference_image_classification_head`	output_mismatch	7/7	`(line 263) AssertionError: Tensor-likes are not close!`
swin2sr	multi	`test_inference_fp16`	output_mismatch	7/7	`(line 332) AssertionError: Tensor-likes are not close!`
swin2sr	single	`test_inference_fp16`	output_mismatch	7/7	`(line 332) AssertionError: Tensor-likes are not close!`
swinv2	multi	`test_inference_fp16`	output_mismatch	7/7	`(line 492) AssertionError: Tensor-likes are not close!`
swinv2	single	`test_inference_fp16`	output_mismatch	7/7	`(line 492) AssertionError: Tensor-likes are not close!`
t5gemma2	multi	`test_model_generation_batch_270m`	output_mismatch	7/7	`(line 1128) AssertionError: Lists differ: [' a [83 chars]e UK.\n\nThe bumblebee is a species of bee tha[15 chars]the'] != [' a [83 chars]e UK.']`
t5gemma2	single	`test_model_generation_batch_270m`	output_mismatch	7/7	`(line 1128) AssertionError: Lists differ: [' a [83 chars]e UK.\n\nThe bumblebee is a species of bee tha[15 chars]the'] != [' a [83 chars]e UK.']`
table_transformer	multi	`test_table_detection`	output_mismatch	7/7	`(line 554) AssertionError: Tensor-likes are not close!`
table_transformer	single	`test_table_detection`	output_mismatch	7/7	`(line 554) AssertionError: Tensor-likes are not close!`
univnet	multi	`test_integration`	output_mismatch	7/7	`(line 330) AssertionError: Scalars are not close!`
univnet	single	`test_integration`	output_mismatch	7/7	`(line 330) AssertionError: Scalars are not close!`
utils	multi	`test_cache_copy`	output_mismatch	7/7	`(line 436) AssertionError: Lists differ: ['You are a helpful assistant. Help me to [390 chars] is'] != ["You are a helpful assistant. Help me to [385 chars] is']`
utils	multi	`test_dynamic_cache_hard`	output_mismatch	7/7	`(line 319) AssertionError: "Here[57 chars]ave fur, they have four legs, they have a tail[1045 chars]have" != "Here[57 chars]ave four legs, they have a tail, they have a f[1078 chars]They"`
utils	single	`test_cache_copy`	output_mismatch	7/7	`(line 436) AssertionError: Lists differ: ['You are a helpful assistant. Help me to [390 chars] is'] != ["You are a helpful assistant. Help me to [385 chars] is']`
utils	single	`test_dynamic_cache_hard`	output_mismatch	7/7	`(line 319) AssertionError: "Here[57 chars]ave fur, they have four legs, they have a tail[1045 chars]have" != "Here[57 chars]ave four legs, they have a tail, they have a f[1078 chars]They"`
video_llava	multi	`test_small_model_integration_test_llama`	output_mismatch	7/7	`(line 491) AssertionError: 'USER: \nDescribe the video in details. A[572 chars]ion.' != "USER: \nDescribe the video in details. A[675 chars]ing."`
video_llava	multi	`test_small_model_integration_test_mixed_inputs`	output_mismatch	7/7	`(line 464) AssertionError: Lists differ: ['USE[183 chars]se it shows a baby sitting on a bed and reading a book. The'] != ['USE[183 chars]se it shows a baby sitting on a bed and reading a book, which']`
video_llava	single	`test_small_model_integration_test_llama`	output_mismatch	7/7	`(line 491) AssertionError: 'USER: \nDescribe the video in details. A[572 chars]ion.' != "USER: \nDescribe the video in details. A[675 chars]ing."`
video_llava	single	`test_small_model_integration_test_mixed_inputs`	output_mismatch	7/7	`(line 464) AssertionError: Lists differ: ['USE[183 chars]se it shows a baby sitting on a bed and reading a book. The'] != ['USE[183 chars]se it shows a baby sitting on a bed and reading a book, which']`
videomae	multi	`test_inference_for_pretraining`	output_mismatch	7/7	`(line 478) AssertionError: Tensor-likes are not close!`
videomae	multi	`test_inference_for_video_classification`	output_mismatch	7/7	`(line 453) AssertionError: Tensor-likes are not close!`
videomae	single	`test_inference_for_pretraining`	output_mismatch	7/7	`(line 478) AssertionError: Tensor-likes are not close!`
videomae	single	`test_inference_for_video_classification`	output_mismatch	7/7	`(line 453) AssertionError: Tensor-likes are not close!`
vilt	multi	`test_inference_masked_lm`	output_mismatch	7/7	`(line 575) AssertionError: Tensor-likes are not close!`
vilt	single	`test_inference_masked_lm`	output_mismatch	7/7	`(line 575) AssertionError: Tensor-likes are not close!`
vision_encoder_decoder	multi	`test_inference_cordv2`	output_mismatch	7/7	`(line 1352) AssertionError: Tensor-likes are not close!`
vision_encoder_decoder	multi	`test_inference_docvqa`	output_mismatch	7/7	`(line 1288) AssertionError: Tensor-likes are not close!`
vision_encoder_decoder	multi	`test_inference_rvlcdip`	output_mismatch	7/7	`(line 1414) AssertionError: Tensor-likes are not close!`
vision_encoder_decoder	single	`test_inference_cordv2`	output_mismatch	7/7	`(line 1352) AssertionError: Tensor-likes are not close!`
vision_encoder_decoder	single	`test_inference_docvqa`	output_mismatch	7/7	`(line 1288) AssertionError: Tensor-likes are not close!`
vision_encoder_decoder	single	`test_inference_rvlcdip`	output_mismatch	7/7	`(line 1414) AssertionError: Tensor-likes are not close!`
vits	multi	`test_forward_fp16`	output_mismatch	7/7	`(line 433) AssertionError: Tensor-likes are not close!`
vits	single	`test_forward_fp16`	output_mismatch	7/7	`(line 433) AssertionError: Tensor-likes are not close!`
vivit	multi	`test_inference_for_video_classification`	output_mismatch	7/7	`(line 361) AssertionError: Tensor-likes are not close!`
vivit	single	`test_inference_for_video_classification`	output_mismatch	7/7	`(line 361) AssertionError: Tensor-likes are not close!`
voxtral	multi	`test_mini_multi_turn_text_and_audio`	output_mismatch	7/7	`(line 381) AssertionError: Lists differ: ['Des[790 chars]as a farewell address by a president, reflecti[151 chars]xt.'] != ['Des[790 chars]as a political speech by a president, reflecti[151 chars]xt.']`
voxtral	multi	`test_mini_single_turn_audio_only`	output_mismatch	7/7	`(line 163) AssertionError: Lists differ: ['The[442 chars]king what A\'s tattoo says, and A always respo[777 chars]nt.'] != ['The[442 chars]king A what his tattoo says, and A always resp[884 chars]on.']`
voxtral	multi	`test_mini_single_turn_text_and_audio`	output_mismatch	7/7	`(line 203) AssertionError: Lists differ: ["Wha[241 chars]. He expresses gratitude for the conversations[429 chars]en."] != ["Wha[241 chars]. He acknowledges the diverse perspectives and[412 chars]es."]`
voxtral	multi	`test_mini_single_turn_text_and_multiple_audios_batched`	output_mismatch	7/7	`(line 327) AssertionError: Lists differ: ["Who[609 chars]m is likely the Seattle Mariners, as the comme[446 chars]me.'] != ["Who[609 chars]m is the Mariners, and the commentator is exci[414 chars]nt.']`
voxtral	single	`test_mini_multi_turn_text_and_audio`	output_mismatch	7/7	`(line 381) AssertionError: Lists differ: ['Des[790 chars]as a farewell address by a president, reflecti[151 chars]xt.'] != ['Des[790 chars]as a political speech by a president, reflecti[151 chars]xt.']`
voxtral	single	`test_mini_single_turn_audio_only`	output_mismatch	7/7	`(line 163) AssertionError: Lists differ: ['The[442 chars]king what A\'s tattoo says, and A always respo[777 chars]nt.'] != ['The[442 chars]king A what his tattoo says, and A always resp[884 chars]on.']`
voxtral	single	`test_mini_single_turn_text_and_audio`	output_mismatch	7/7	`(line 203) AssertionError: Lists differ: ["Wha[241 chars]. He expresses gratitude for the conversations[429 chars]en."] != ["Wha[241 chars]. He acknowledges the diverse perspectives and[412 chars]es."]`
voxtral	single	`test_mini_single_turn_text_and_multiple_audios_batched`	output_mismatch	7/7	`(line 327) AssertionError: Lists differ: ["Who[609 chars]m is likely the Seattle Mariners, as the comme[446 chars]me.'] != ["Who[609 chars]m is the Mariners, and the commentator is exci[414 chars]nt.']`
voxtral_realtime	multi	`test_batched_longform`	output_mismatch	7/7	`(line 349) AssertionError: Lists differ: [' Come on! Dude. You got a tattoo. So did you, dud[1097 chars]the"] != [' Come on. Dude. You got a tattoo. So did you, dud[1097 chars]the"]`
voxtral_realtime	single	`test_batched_longform`	output_mismatch	7/7	`(line 349) AssertionError: Lists differ: [' Come on! Dude. You got a tattoo. So did you, dud[1097 chars]the"] != [' Come on. Dude. You got a tattoo. So did you, dud[1097 chars]the"]`
whisper	multi	`test_small_token_timestamp_generation`	output_mismatch	7/7	`(line 2023) AssertionError: Tensor-likes are not close!`
whisper	multi	`test_speculative_decoding_non_distil`	output_mismatch	7/7	`(line 2390) AssertionError: Lists differ: [' Mr[35 chars]dle classes and we are glad to welcome his gospel. Thank you.'] != [' Mr[35 chars]dle classes and we are glad to welcome his gospel.']`
whisper	multi	`test_tiny_en_batched_generation`	output_mismatch	7/7	`(line 1541) AssertionError: The values for attribute 'shape' do not match: torch.Size([4, 18]) != torch.Size([4, 20]).`
whisper	multi	`test_tiny_en_generation`	output_mismatch	7/7	`(line 1383) AssertionError: ' Mr.[15 chars] apostle of the middle classes, and we are glad to' != ' Mr.[15 chars] apostle of the middle classes, and we are glad to welcome his'`
whisper	multi	`test_tiny_generation`	output_mismatch	7/7	`(line 1399) AssertionError: ' Mr.[21 chars]le of the middle classes and we are glad' != ' Mr.[21 chars]le of the middle classes and we are glad to welcome his gospel'`
whisper	multi	`test_tiny_specaugment_librispeech`	output_mismatch	7/7	`(line 2137) AssertionError: Tensor-likes are not close!`
whisper	multi	`test_whisper_longform_multi_batch_hard`	output_mismatch	7/7	`(line 2787) AssertionError: Lists differ: [" Fo[272 chars]ting of classics, Sicilian, nade door variatio[8147 chars]le!'] != [" Fo[272 chars]ting a classic Sicilian, nade door variation o[8150 chars]le!']`
whisper	multi	`test_whisper_longform_multi_batch_hard_prev_cond`	output_mismatch	7/7	`(line 2841) AssertionError: Lists differ: [" Fo[425 chars]a fischer shows in lip nitskey attack the fisc[5579 chars]y ."] != [" Fo[425 chars]a fisher shows in lip-nitsky attack that culmi[7900 chars]le!"]`
whisper	multi	`test_whisper_longform_no_speech_detection`	output_mismatch	7/7	`(line 2947) AssertionError: Lists differ: [" Fo[435 chars]sting And so so so so so so so so so so so so [7329 chars]our"] != [" Fo[435 chars]sting", ' Ladies and gentlemen, you know, I sp[1433 chars]es."]`
whisper	multi	`test_whisper_shortform_single_batch_prev_cond`	output_mismatch	7/7	`(line 2556) AssertionError: Lists differ: [" Fo[268 chars]ating, so soft, it would make JD power and her[196 chars]ke."] != [" Fo[268 chars]ating so soft, it would make JD power and her [195 chars]ke."]`
whisper	single	`test_small_token_timestamp_generation`	output_mismatch	7/7	`(line 2023) AssertionError: Tensor-likes are not close!`
whisper	single	`test_speculative_decoding_non_distil`	output_mismatch	7/7	`(line 2390) AssertionError: Lists differ: [' Mr[35 chars]dle classes and we are glad to welcome his gospel. Thank you.'] != [' Mr[35 chars]dle classes and we are glad to welcome his gospel.']`
whisper	single	`test_tiny_en_batched_generation`	output_mismatch	7/7	`(line 1541) AssertionError: The values for attribute 'shape' do not match: torch.Size([4, 18]) != torch.Size([4, 20]).`
whisper	single	`test_tiny_en_generation`	output_mismatch	7/7	`(line 1383) AssertionError: ' Mr.[15 chars] apostle of the middle classes, and we are glad to' != ' Mr.[15 chars] apostle of the middle classes, and we are glad to welcome his'`
whisper	single	`test_tiny_generation`	output_mismatch	7/7	`(line 1399) AssertionError: ' Mr.[21 chars]le of the middle classes and we are glad' != ' Mr.[21 chars]le of the middle classes and we are glad to welcome his gospel'`
whisper	single	`test_tiny_specaugment_librispeech`	output_mismatch	7/7	`(line 2137) AssertionError: Tensor-likes are not close!`
whisper	single	`test_whisper_longform_multi_batch_hard`	output_mismatch	7/7	`(line 2787) AssertionError: Lists differ: [" Fo[272 chars]ting of classics, Sicilian, nade door variatio[8147 chars]le!'] != [" Fo[272 chars]ting a classic Sicilian, nade door variation o[8150 chars]le!']`
whisper	single	`test_whisper_longform_multi_batch_hard_prev_cond`	output_mismatch	7/7	`(line 2841) AssertionError: Lists differ: [" Fo[425 chars]a fischer shows in lip nitskey attack the fisc[5579 chars]y ."] != [" Fo[425 chars]a fisher shows in lip-nitsky attack that culmi[7900 chars]le!"]`
whisper	single	`test_whisper_longform_no_speech_detection`	output_mismatch	7/7	`(line 2947) AssertionError: Lists differ: [" Fo[435 chars]sting And so so so so so so so so so so so so [7329 chars]our"] != [" Fo[435 chars]sting", ' Ladies and gentlemen, you know, I sp[1433 chars]es."]`
whisper	single	`test_whisper_shortform_single_batch_prev_cond`	output_mismatch	7/7	`(line 2556) AssertionError: Lists differ: [" Fo[268 chars]ating, so soft, it would make JD power and her[196 chars]ke."] != [" Fo[268 chars]ating so soft, it would make JD power and her [195 chars]ke."]`
zamba	multi	`test_simple_batched_generate_with_padding`	output_mismatch	7/7	`(line 476) AssertionError: '<s> [20 chars]g on this lovely evening? I hope you are having a great day. I' != '<s> [20 chars]g on this lovely evening? I hope you are all doing well. I am'`
zamba	multi	`test_simple_generate`	output_mismatch	7/7	`(line 463) AssertionError: The values for attribute 'dtype' do not match: torch.bfloat16 != torch.float32.`
zamba	single	`test_simple_batched_generate_with_padding`	output_mismatch	7/7	`(line 476) AssertionError: '<s> [20 chars]g on this lovely evening? I hope you are having a great day. I' != '<s> [20 chars]g on this lovely evening? I hope you are all doing well. I am'`
zamba	single	`test_simple_generate`	output_mismatch	7/7	`(line 463) AssertionError: The values for attribute 'dtype' do not match: torch.bfloat16 != torch.float32.`
zamba2	multi	`test_simple_batched_generate_with_padding_0_cuda`	output_mismatch	7/7	`(line 600) AssertionError: Tensor-likes are not close!`
zamba2	single	`test_simple_batched_generate_with_padding_0_cuda`	output_mismatch	7/7	`(line 600) AssertionError: Tensor-likes are not close!`

Unpinned failure modes

mode	count
output_mismatch	487
other	153
OOM	57
import_or_config	22
load_error	10
cuda_runtime	2

Per-model breakdown (all failures)

model	failures	gpu	mode mix
whisper	42	multi/single	other 22 output_mismatch 20
musicgen_melody	16	multi/single	output_mismatch 16
generation	15	multi/single	output_mismatch 8 other 3 import_or_config 2 cuda_runtime 2
gemma	14	multi/single	output_mismatch 14
dac	12	multi/single	output_mismatch 12
edgetam	12	multi/single	import_or_config 12
glm46v	12	multi/single	other 12
glm_ocr	12	multi/single	output_mismatch 10 other 2
gemma3	10	multi/single	output_mismatch 10
gemma3n	10	multi/single	output_mismatch 10
vision_encoder_decoder	10	multi/single	output_mismatch 6 other 4
cohere2_vision	8	multi/single	other 4 output_mismatch 2 OOM 2
emu3	8	multi/single	OOM 5 import_or_config 2 output_mismatch 1
falcon_mamba	8	multi/single	output_mismatch 8
gemma4	8	multi/single	output_mismatch 6 OOM 2
grounding_dino	8	multi/single	output_mismatch 8
higgs_audio_v2	8	multi/single	output_mismatch 8
llava	8	multi/single	output_mismatch 4 OOM 4
mamba2	8	multi/single	OOM 8
mllama	8	multi/single	output_mismatch 8
moshi	8	multi/single	output_mismatch 7 OOM 1
mpt	8	multi/single	other 8
qwen3_vl_moe	8	multi/single	other 6 output_mismatch 1 OOM 1
recurrent_gemma	8	multi/single	output_mismatch 8
voxtral	8	multi/single	output_mismatch 8
peft_integration	8	multi/single	other 8
olmo	7	multi/single	output_mismatch 4 OOM 2 other 1
bloom	6	multi/single	output_mismatch 6
bridgetower	6	multi/single	other 6
chameleon	6	multi/single	output_mismatch 6
deepseek_v2	6	multi/single	OOM 6
exaone4	6	multi/single	OOM 4 output_mismatch 2
fsmt	6	multi/single	output_mismatch 6
glm	6	multi/single	OOM 4 output_mismatch 2
kosmos2	6	multi/single	output_mismatch 4 import_or_config 2
layoutlmv2	6	multi/single	output_mismatch 6
mistral3	6	multi/single	output_mismatch 6
mluke	6	multi/single	output_mismatch 6
mm_grounding_dino	6	multi/single	output_mismatch 6
owlvit	6	multi/single	output_mismatch 6
phimoe	6	multi/single	OOM 3 other 3
plbart	6	multi/single	output_mismatch 6
qwen3_omni_moe	6	multi/single	other 4 output_mismatch 2
seamless_m4t	6	multi/single	other 6
seamless_m4t_v2	6	multi/single	other 6
starcoder2	6	multi/single	output_mismatch 6
blip_2	5	multi/single	output_mismatch 5
deepseek_vl_hybrid	5	multi/single	other 2 OOM 2 output_mismatch 1
deepseek_v4	4	multi/single	load_error 4
audioflamingo3	4	multi/single	other 4
bamba	4	multi/single	OOM 4
bitnet	4	multi/single	other 4
clvp	4	multi/single	output_mismatch 2 other 2
colqwen2	4	multi/single	other 2 output_mismatch 2
cwm	4	multi/single	import_or_config 2 OOM 1 output_mismatch 1
deepseek_vl	4	multi/single	other 3 output_mismatch 1
flava	4	multi/single	output_mismatch 4
gemma2	4	multi/single	output_mismatch 2 other 2
internvl	4	multi/single	output_mismatch 4
jais2	4	multi/single	load_error 4
janus	4	multi/single	other 4
lfm2_vl	4	multi/single	output_mismatch 4
llava_next_video	4	multi/single	output_mismatch 4
longt5	4	multi/single	output_mismatch 4
luke	4	multi/single	output_mismatch 4
lw_detr	4	multi/single	output_mismatch 4
mimi	4	multi/single	output_mismatch 4
ministral	4	multi/single	output_mismatch 4
ministral3	4	multi/single	output_mismatch 4
mistral	4	multi/single	output_mismatch 4
mistral4	4	multi/single	other 4
mixtral	4	multi/single	output_mismatch 4
moonshine_streaming	4	multi/single	output_mismatch 4
musicgen	4	multi/single	output_mismatch 4
nemotron	4	multi/single	output_mismatch 2 import_or_config 2
oneformer	4	multi/single	output_mismatch 4
persimmon	4	multi/single	output_mismatch 3 other 1
pvt	4	multi/single	output_mismatch 4
pvt_v2	4	multi/single	output_mismatch 2 other 2
qwen2_5_omni	4	multi/single	output_mismatch 4
qwen2_moe	4	multi/single	output_mismatch 2 other 2
qwen3	4	multi/single	output_mismatch 4
qwen3_5	4	multi/single	output_mismatch 4
qwen3_moe	4	multi	load_error 4
rag	4	single	output_mismatch 4
seed_oss	4	multi/single	output_mismatch 4
smollm3	4	multi/single	output_mismatch 4
stablelm	4	multi/single	output_mismatch 4
video_llava	4	multi/single	output_mismatch 4
videomae	4	multi/single	output_mismatch 4
zamba	4	multi/single	output_mismatch 4
utils	4	multi/single	output_mismatch 4
musicflamingo	4	multi/single	other 4
flex_olmo	3	multi/single	output_mismatch 2 other 1
pegasus	3	multi/single	other 3
aya_vision	2	multi/single	output_mismatch 2
big_bird	2	multi/single	output_mismatch 2
convnextv2	2	multi/single	output_mismatch 2
cvt	2	multi/single	output_mismatch 2
dab_detr	2	multi/single	output_mismatch 2
dbrx	2	multi/single	other 2
deepseek_v3	2	multi/single	output_mismatch 1 other 1
depth_anything	2	multi/single	output_mismatch 2
dia	2	multi/single	output_mismatch 2
diffllama	2	multi/single	output_mismatch 2
efficientnet	2	multi/single	output_mismatch 2
eomt_dinov3	2	multi/single	output_mismatch 2
evolla	2	multi/single	output_mismatch 2
exaone4_5	2	multi	OOM 2
exaone_moe	2	multi/single	output_mismatch 2
falcon_h1	2	multi/single	output_mismatch 2
fastspeech2_conformer	2	multi/single	output_mismatch 2
florence2	2	multi/single	output_mismatch 2
fuyu	2	multi/single	output_mismatch 2
git	2	multi/single	other 2
glm4_moe	2	multi/single	OOM 2
glm4_moe_lite	2	multi/single	OOM 2
glm4v_moe	2	multi	other 2
glm_image	2	multi/single	output_mismatch 2
got_ocr2	2	multi/single	output_mismatch 2
granite	2	multi/single	output_mismatch 2
helium	2	multi/single	output_mismatch 2
hiera	2	multi/single	output_mismatch 2
hyperclovax	2	multi/single	other 2
instructblip	2	multi/single	output_mismatch 2
instructblipvideo	2	multi/single	output_mismatch 2
jamba	2	multi/single	output_mismatch 2
kosmos2_5	2	multi/single	output_mismatch 2
lfm2_moe	2	multi/single	output_mismatch 2
llama	2	multi/single	output_mismatch 2
llava_next	2	multi/single	output_mismatch 2
m2m_100	2	multi/single	output_mismatch 2
minimax	2	multi/single	output_mismatch 2
modernvbert	2	multi/single	other 2
nllb_moe	2	multi/single	output_mismatch 2
olmo2	2	multi/single	output_mismatch 2
olmo3	2	multi/single	output_mismatch 2
olmoe	2	multi/single	output_mismatch 2
opt	2	multi/single	output_mismatch 2
ovis2	2	multi/single	output_mismatch 2
phi3	2	multi/single	other 2
pi0	2	multi/single	OOM 2
pixio	2	multi/single	output_mismatch 2
qwen2_5_vl	2	multi/single	output_mismatch 2
reformer	2	multi/single	output_mismatch 2
regnet	2	multi/single	output_mismatch 2
resnet	2	multi/single	output_mismatch 2
superpoint	2	multi/single	other 2
swiftformer	2	multi/single	output_mismatch 2
swin2sr	2	multi/single	output_mismatch 2
swinv2	2	multi/single	output_mismatch 2
t5gemma2	2	multi/single	output_mismatch 2
table_transformer	2	multi/single	output_mismatch 2
univnet	2	multi/single	output_mismatch 2
vilt	2	multi/single	output_mismatch 2
vits	2	multi/single	output_mismatch 2
vivit	2	multi/single	output_mismatch 2
voxtral_realtime	2	multi/single	output_mismatch 2
zamba2	2	multi/single	output_mismatch 2
blt	2	multi	other 2
hunyuan_v1_moe	1	multi	other 1

Pinned clusters (CI bisect)

(none)

Flaky (CI flagged)

model	gpu	test	mode	days
deepseek_v4	single	`test_v4_flash_dequantized_chat_seven_prompts`	load_error	6/7
deepseek_v4	single	`test_v4_flash_dequantized_generation`	load_error	6/7

Unpinned — samples per mode

These failures persisted across the window but CI couldn't attribute a bad commit. They likely regressed before the 7-day bisect window. Showing the most-recently-seen samples per failure mode.

output_mismatch 487 unpinned failures — sample of 5

model	gpu	test	days	trace excerpt
zamba2	single	`test_simple_batched_generate_with_padding_0_cuda`	7/7	`(line 600) AssertionError: Tensor-likes are not close!`
zamba2	multi	`test_simple_batched_generate_with_padding_0_cuda`	7/7	`(line 600) AssertionError: Tensor-likes are not close!`
zamba	single	`test_simple_batched_generate_with_padding`	7/7	`(line 476) AssertionError: '<s> [20 chars]g on this lovely evening? I hope you are having a great day. I' != '<s> [20 chars]g on this lovely evening? I hope you are all doing well. I am'`
zamba	single	`test_simple_generate`	7/7	`(line 463) AssertionError: The values for attribute 'dtype' do not match: torch.bfloat16 != torch.float32.`
zamba	multi	`test_simple_batched_generate_with_padding`	7/7	`(line 476) AssertionError: '<s> [20 chars]g on this lovely evening? I hope you are having a great day. I' != '<s> [20 chars]g on this lovely evening? I hope you are all doing well. I am'`

OOM 57 unpinned failures — sample of 5

model	gpu	test	days	trace excerpt
qwen3_vl_moe	multi	`test_small_model_integration_test_expand`	7/7	`(line 991) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 768.00 MiB. GPU 1 has a total capacity of 22.30 GiB of which 240.69 MiB is free. Process 643790 has 22.06 GiB memory in use. Of the allocated mem…`
pi0	single	`test_train_pi0_base_libero`	7/7	`(line 193) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 6.69 MiB is free. Process 777692 has 22.29 GiB memory in use. Of the allocated memory…`
pi0	multi	`test_train_pi0_base_libero`	7/7	`(line 785) torch.OutOfMemoryError: Caught OutOfMemoryError in replica 0 on device 0.`
phimoe	single	`test_model_phimoe_instruct_logits`	6/7	`(line 353) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.56 GiB. GPU 0 has a total capacity of 22.30 GiB of which 814.69 MiB is free. Process 329492 has 21.50 GiB memory in use. Of the allocated memor…`
phimoe	single	`test_phimoe_instruct_generation`	5/7	`(line 353) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.56 GiB. GPU 0 has a total capacity of 22.30 GiB of which 812.69 MiB is free. Process 329492 has 21.50 GiB memory in use. Of the allocated memor…`

load_error 10 unpinned failures — sample of 5

model	gpu	test	days	trace excerpt
qwen3_moe	multi	`test_model_15b_a2b_generation`	7/7	`(line 74) ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modul…`
qwen3_moe	multi	`test_model_15b_a2b_logits`	7/7	`(line 74) ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modul…`
qwen3_moe	multi	`test_model_15b_a2b_long_prompt_sdpa`	7/7	`(line 74) ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modul…`
qwen3_moe	multi	`test_speculative_generation`	7/7	`(line 74) ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modul…`
jais2	multi	`test_model_generation`	7/7	`(line 503) OSError: You are trying to access a gated repo.`

cuda_runtime 2 unpinned failures — sample of 2

model	gpu	test	days	trace excerpt
generation	single	`test_validate_assistant`	7/7	`(line 1909) torch.AcceleratorError: CUDA error: device-side assert triggered`
generation	multi	`test_validate_assistant`	7/7	`(line 1909) torch.AcceleratorError: CUDA error: device-side assert triggered`

import_or_config 22 unpinned failures — sample of 5

model	gpu	test	days	trace excerpt
nemotron	single	`test_nemotron_8b_generation_fa2`	7/7	`(line 1725) ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package for FlashAttention2 doesn't seem to be installed.`
nemotron	multi	`test_nemotron_8b_generation_fa2`	6/7	`(line 1725) ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package for FlashAttention2 doesn't seem to be installed.`
kosmos2	multi	`test_inference_interpolate_pos_encoding`	7/7	`(line 777) AttributeError: 'NoneType' object has no attribute 'last_hidden_state'`
kosmos2	single	`test_inference_interpolate_pos_encoding`	7/7	`(line 777) AttributeError: 'NoneType' object has no attribute 'last_hidden_state'`
generation	single	`test_green_red_watermark_generation`	7/7	`(line 665) AttributeError: 'dict' object has no attribute 'validate'`

other 153 unpinned failures — sample of 5

model	gpu	test	days	trace excerpt
whisper	single	`test_distil_token_timestamp_generation`	7/7	`(line 370) RuntimeError: Input type (float) and bias type (c10::Half) should be the same`
whisper	single	`test_large_batched_generation`	7/7	`(line 370) RuntimeError: Input type (float) and bias type (c10::Half) should be the same`
whisper	single	`test_large_generation`	7/7	`(line 370) RuntimeError: Input type (float) and bias type (c10::Half) should be the same`
whisper	single	`test_large_generation_multilingual`	7/7	`(line 370) RuntimeError: Input type (float) and bias type (c10::Half) should be the same`
whisper	single	`test_large_timestamp_generation`	7/7	`(line 370) RuntimeError: Input type (float) and bias type (c10::Half) should be the same`