Does anyone know how to use AI to do semantic segmentation of images (specifically textbooks)?
I tried this with ChatGPT (the web interface) and the attached image but it didn't work at all, it just segmented it in 10% intervals lol.
Analyze the attached image, and segment it so all the content is covered, and each section covers approximately between 10 and 15 lines, and ideally is segmented at points where the text transitions from one topic or section to another.
The segmentation should be outputted in the following format:
BEGIN
[x0,y0,x1,y1]
[x0,y0,x1,y1]
...
[x0,y0,x1,y1]
[x0,y0,x1,y1]
END
with the coordinates starting from the upper left corner and being specified in percentage points.
For example:
BEGIN
[5.00,5.00,95.00,10.00]
[5.00,11.00,95.00,13.00]
[5.00,15.00,95.00,17.00]
END
My end goal is to build a semi-automated PDF to LaTeX pipeline, so then I can use the LLMs along with the OCR'd LaTeX of the textbooks to tutor me on these subjects.
To get a decent transcript I have to segment the pages into multiple pieces, because probably when the model loads the image it does it at a fixed resolution, and if the image is too large it ends up being unreadable to the model. This is an attempt at a preprocessing step to do conversion to LaTeX later on.