ARC-AGI Dataset
Table of contents
2024 ARC-AGI Dataset
Let us begin with understanding the data.
Data format
All the data is JSON encoded, the structures of JSON are as followings (* stands for training , evaluation and test) :
- *_challenges
{ "task_id": { "test": [ { "input": [[...], ..., [...]] } ], "train": [ { "input": [[...], ..., [...]], "output": [[...], ..., [...]] }, ... ] }, ... } - *_solutions
{ "task_id": [[...], ..., [...]], ... } - submission format
{ "task_id": [ { "attempt_1": [[...], ..., [...]], "attempt_2": [[...], ..., [...]] }, ], ... }
All the arrays have integer digits in range 0 to 9, and each element stands index of some hex_colors,
hex_colors = ['#000000', '#0074D9','#FF4136','#2ECC40','#FFDC00',
'#AAAAAA', '#F012BE', '#FF851B', '#7FDBFF', '#870C25']
Visualization
Let’s visualize the data (I choose f35d900a and fcb5c309) Basically, we visaulize the data in 2d-grid with colors, however, we could treat the color values not as color channel but as positional values in z-axis, thinking that we applied one-hot encoding. The third one is colored 3-d scatter, just for highlights the z-level.
| f35d900a | fcb5c309 | |
|---|---|---|
| Basic | ![]() | ![]() |
| 3d scatter | ![]() | ![]() |
| 3d scatter (colored) | ![]() | ![]() |





