Sum of the scores of all perception subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, and OCR. The full score of each subtask is 200, and that of all perception is 2000.
Sum of the scores of all cognition subtasks, including commonsense reasoning, numerical calculation, text translation, and code reasoning. The full score of each subtask is 200, and that of all cognition is 800.
# | Model | LLM Params |
Score | Perception | Cognition | Existence | Count | Position | Color | Poster | Celebrity | Scene | Landmark | Artwork | OCR | Commonsense Reasoning |
Numerical Calculation |
Text Translation |
Code Reasoning |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Qwen-VL-Max | - | 2433.61 | 1790.04 | 643.57 | 183.33 | 166.67 | 176.67 | 180.00 | 187.76 | 184.12 | 173.00 | 187.50 | 166.00 | 185.00 | 148.57 | 155.00 | 170.00 | 170.00 | |
TeleMM | - | 2387.64 | 1706.21 | 681.43 | 195.00 | 165.00 | 160.00 | 185.00 | 187.07 | 165.88 | 161.50 | 182.00 | 157.25 | 147.50 | 171.43 | 140.00 | 200.00 | 170.000 | |
MiniCPM-V 2.6 | 8B | 2365.88 | 1668.03 | 697.86 | 195.00 | 160.00 | 136.67 | 175.00 | 179.93 | 151.18 | 153.25 | 177.50 | 147.00 | 192.50 | 157.86 | 185.00 | 185.00 | 170.00 | RBDash v1.2 | 72B | 2333.57 | 1769.05 | 564.52 | 200.00 | 170.00 | 158.33 | 190.00 | 185.71 | 175.00 | 169.50 | 183.75 | 144.25 | 192.50 | 163.57 | 147.50 | 80.00 | 173.45 |
InternLM-XComposer2-VL 2.6 | 7B | 2242.71 | 1712.00 | 530.71 | 195.00 | 160.00 | 163.33 | 195.00 | 171.09 | 153.82 | 164.75 | 176.00 | 185.50 | 147.50 | 145.71 | 137.50 | 147.50 | 100.00 | |
JT-VL-Chat-V2.0 | - | 2204.54 | 1743.11 | 461.43 | 200.00 | 170.00 | 173.33 | 190.00 | 184.35 | 176.18 | 161.50 | 176.75 | 148.50 | 162.50 | 161.43 | 117.50 | 72.50 | 110.00 | |
InternVL-Chat-V1.5 | 20B | 2187.84 | 1637.84 | 550.00 | 190.00 | 175.00 | 166.67 | 178.33 | 173.81 | 138.53 | 154.75 | 177.75 | 143.00 | 140.00 | 135.00 | 125.00 | 185.00 | 105.00 | |
Qwen-VL-Plus | - | 2183.39 | 1681.25 | 502.14 | 175.00 | 153.33 | 161.67 | 180.00 | 181.63 | 184.12 | 151.00 | 191.00 | 156.00 | 147.50 | 142.14 | 85.00 | 185.00 | 90.00 | |
LLaVA-1.6 | 34B | 2028.61 | 1631.47 | 397.14 | 190.00 | 170.00 | 138.33 | 195.00 | 169.39 | 160.00 | 164.50 | 165.25 | 139.00 | 140.00 | 152.14 | 72.50 | 72.50 | 100.00 | |
Bunny-8B | 8B | 2011.64 | 1644.14 | 367.50 | 195.00 | 165.00 | 135.00 | 195.00 | 167.35 | 175.29 | 153.75 | 170.25 | 132.50 | 155.00 | 140.00 | 75.00 | 95.00 | 57.50 | |
JT-VL-Chat | - | 1982.15 | 1642.51 | 339.64 | 185.00 | 173.33 | 145.00 | 175.00 | 168.71 | 161.47 | 157.75 | 173.50 | 132.75 | 170.00 | 122.14 | 67.50 | 105.00 | 45.00 | |
Monkey-Chat | 7B | 1923.82 | 1522.39 | 401.43 | 185.00 | 150.00 | 118.33 | 185.00 | 178.91 | 142.65 | 161.75 | 176.50 | 144.25 | 80.00 | 131.43 | 42.50 | 137.50 | 90.00 | |
ShareGPT4V | 13B | 1921.92 | 1618.70 | 303.21 | 190.00 | 165.00 | 153.33 | 185.00 | 169.05 | 153.82 | 168.00 | 174.00 | 128.00 | 132.50 | 125.71 | 45.00 | 80.00 | 52.50 | |
InternLM-XComposer-VL | 7B | 1919.51 | 1528.44 | 391.07 | 190.00 | 158.33 | 126.67 | 165.00 | 161.90 | 150.29 | 159.75 | 165.25 | 126.25 | 125.00 | 138.57 | 55.00 | 112.50 | 85.00 | |
SPHINX | 13B | 1870.15 | 1560.15 | 310.00 | 195.00 | 160.00 | 153.33 | 160.00 | 164.29 | 177.94 | 160.00 | 168.09 | 134.00 | 87.50 | 130.00 | 55.00 | 75.00 | 50.00 | |
Qwen-VL-Chat | 7B | 1848.28 | 1487.57 | 360.71 | 158.33 | 150.00 | 128.33 | 170.00 | 178.57 | 120.59 | 152.25 | 164.00 | 125.50 | 140.00 | 130.71 | 40.00 | 147.50 | 42.50 | |
LLaVA | 13B | 1826.67 | 1531.31 | 295.36 | 185.00 | 155.00 | 133.33 | 170.00 | 160.54 | 152.94 | 161.25 | 170.50 | 117.75 | 125.00 | 127.86 | 42.50 | 77.50 | 47.50 | |
mPLUG-Owl2 | 7B | 1763.40 | 1450.19 | 313.21 | 185.00 | 155.00 | 88.33 | 150.00 | 160.20 | 164.41 | 153.25 | 157.25 | 134.25 | 102.50 | 115.71 | 35.00 | 102.50 | 60.00 | |
CogVLM | 7B | 1752.28 | 1439.07 | 313.21 | 195.00 | 165.00 | 103.33 | 160.00 | 146.94 | 115.29 | 159.25 | 158.00 | 88.75 | 147.50 | 125.71 | 60.00 | 75.00 | 52.50 |
Green date indicates the newly added/updated models          - indicates the unknown quantity of parameters