When building the BytArch API, especially the BytArch-Lumina model, the main goal was to solve the “messy document” problem. If you’ve ever asked AI to read invoices or scanned contracts, you’ve probably seen how unreliable it can be. Instead of clean data, you often get unstructured text or even made up values.
Lumina handles this much better, but the real difference comes from how you prompt it. This guide covers the setup and shows how to write prompts that make the model behave like a data extractor instead of a chatbot.
The Setup
Start by sending a POST request to the API endpoint using your API key, which you can get at bytarch.com/profile. Full API documentation is available at bytarch.com/docs, which includes authentication details and request examples.
Use https://api.bytarch.com/openai/v1/chat/completions and set the model to BytArch/BytArch-Lumina with a basic message payload to confirm everything is working.
Prompting for Extraction
Once connected, the important part is being specific. For document extraction, you need to clearly define the format of the output. A vague prompt will give you vague results.
For example, when processing receipts, ask for a JSON object with specific fields instead of a summary. Tell the model exactly what to extract, such as invoice_number, date, and total_amount, and make it clear that the response should contain only JSON.
You can include the document as an image_url along with your instructions. This works for scanned files, PDFs converted to images, or photos of receipts.
Why This Works
-
Clear constraints: Instructions like
Do not include conversational fillerkeep the output clean and usable. - Defined structure: Specifying exact JSON keys helps the model organize messy input into consistent data.
- Multimodal input: Lumina can handle both text and images, which makes it flexible for different document types.
Wrapping Up
Good document extraction comes down to clarity. Once your JSON prompt is solid, you can process large batches of files without changing your code. Start simple, test with your documents, and adjust the fields until the results are reliable.

