Engineering· AI·April 17, 2026

Data Extraction using Lumina

Lumina extracts clean, structured data from messy documents using strict JSON prompts.

BytArch

Founder

When building the BytArch API, especially the BytArch-Lumina model, the main goal was to solve the “messy document” problem. If you’ve ever asked AI to read invoices or scanned contracts, you’ve probably seen how unreliable it can be. Instead of clean data, you often get unstructured text or even made up values.

Lumina handles this much better, but the real difference comes from how you prompt it. This guide covers the setup and shows how to write prompts that make the model behave like a data extractor instead of a chatbot.

The Setup

Start by sending a POST request to the API endpoint using your API key, which you can get at bytarch.com/profile. Full API documentation is available at bytarch.com/docs, which includes authentication details and request examples. Use https://api.bytarch.com/openai/v1/chat/completions and set the model to BytArch/BytArch-Lumina with a basic message payload to confirm everything is working.

Prompting for Extraction

Once connected, the important part is being specific. For document extraction, you need to clearly define the format of the output. A vague prompt will give you vague results.

For example, when processing receipts, ask for a JSON object with specific fields instead of a summary. Tell the model exactly what to extract, such as invoice_number, date, and total_amount, and make it clear that the response should contain only JSON.

You can include the document as an image_url along with your instructions. This works for scanned files, PDFs converted to images, or photos of receipts.

Why This Works

Clear constraints: Instructions like Do not include conversational filler keep the output clean and usable.
Defined structure: Specifying exact JSON keys helps the model organize messy input into consistent data.
Multimodal input: Lumina can handle both text and images, which makes it flexible for different document types.

Wrapping Up

Good document extraction comes down to clarity. Once your JSON prompt is solid, you can process large batches of files without changing your code. Start simple, test with your documents, and adjust the fields until the results are reliable.

Share this article

Data Extraction using Lumina

The Setup

Prompting for Extraction

Why This Works

Wrapping Up

JSON Prompting for Reliable Document Extraction

Sunsetting Gcoder Free Tier

Giving Your AI Hands: The New BytArch Native Tool Calling