视觉/AIGC面经->多模态
1.ocr检测如何做?qwen的文本检测是否合理?
paligemma:
<loc0110><loc0124><loc0224><loc0389> plate ; <loc0244><loc0130><loc0281><loc0430> plate ; <loc0364><loc0820><loc0403><loc0951> plate ; <loc0470><loc0140><loc0521><loc0228> plate ; <loc0558><loc0953><loc0582><loc0988> plate ; <loc0570><loc0149><loc0619><loc0228> plate ; <loc0792><loc0062><loc0827><loc0315> plate ; <loc0829><loc0062><loc0865><loc0343> plate ; <loc0556><loc0906><loc0592><loc0940> plate ; <loc0690><loc0837><loc0715><loc0853> plate ; <loc0770><loc0792><loc0800><loc0808> plate ; <loc0767><loc0833><loc0798><loc0853> plate ; <loc0765><loc0879><loc0796><loc0900>