Mexican Document OCR API: Extract Data from INE, CFDI, CSF, and CURP

Extract structured data from Mexican government documents automatically. INE voter ID, CFDI invoices, CSF tax certificates, and CURP cards — all via a single REST API. Python and Node.js examples.

March 17, 2026 · 6 min read

Mexico's regulatory landscape requires businesses to process several types of government-issued documents: voter IDs (INE), digital invoices (CFDI), tax certificates (CSF), and population registry documents (CURP). Manually extracting data from these documents is slow and error-prone. This guide covers how to automate extraction for all of them with a single API.

Supported Mexican documents

DocumentSpanish nameKey fields extractedCommon use case
INE / IFE Voter IDCredencial para votarNombre, CURP, fecha nacimiento, domicilio, vigenciaKYC, identity verification
CFDI Invoice (XML)Comprobante Fiscal DigitalUUID, emisor RFC, receptor RFC, total, IVA, conceptosAccounting, ERP, expense management
CSF CertificateConstancia de Situación FiscalRFC, régimen fiscal, domicilio fiscal, actividadesVendor onboarding, supplier validation
CURP CardCédula de CURPCURP, nombre, fecha nacimiento, entidad de nacimientoHR onboarding, benefits registration
PasaportePasaporte mexicanoNombre, número de pasaporte, fecha vencimiento, MRZInternational KYC, travel

Extract INE voter ID data

Python

from ocilar import OcilarClient

client = OcilarClient(api_key="sk-your_key")

result = client.extract_ine(
    front="ine_front.jpg",
    back="ine_back.jpg"
)

print(result.nombre)           # "JULIO JUAREZ MARTINEZ"
print(result.curp)             # "JUAJ850101HMCRRL09"
print(result.fecha_nacimiento) # "1985-01-01"
print(result.domicilio)        # "CALLE INDEPENDENCIA 123..."
print(result.vigencia)         # "2029"

# Check if expired
from datetime import datetime
is_valid = int(result.vigencia) >= datetime.now().year

cURL

curl -X POST https://api.ocilar.com/api/v1/extract/ine \
  -H "X-API-Key: sk-your_key" \
  -F "front=@ine_front.jpg" \
  -F "back=@ine_back.jpg"

Extract CFDI invoice data

Python

result = client.extract_cfdi(file_path="factura.xml")

print(result.uuid)          # "6128a3d4-1234-..."
print(result.emisor_rfc)    # "AAA010101AAA"
print(result.total)         # 11600.00
print(result.iva)           # 1600.00

for concepto in result.conceptos:
    print(concepto.descripcion, concepto.importe)

From bytes (e.g. downloaded from SAT portal)

with open("factura.xml", "rb") as f:
    result = client.extract_cfdi(file_bytes=f.read())

# Supports CFDI 3.3 and 4.0 automatically

Extract CSF (Constancia de Situación Fiscal)

Python

result = client.extract_csf(file_path="csf.pdf")

print(result.rfc)                  # "JUAJ850101ABC"
print(result.nombre)               # "JULIO JUAREZ MARTINEZ"
print(result.regimen_fiscal)       # "612 - Personas Físicas con Actividades Empresariales"
print(result.domicilio_fiscal)     # "CALLE INDEPENDENCIA 123..."
print(result.codigo_postal)        # "44100"
print(result.actividades)          # ["Desarrollo de software", ...]
print(result.fecha_inicio_ops)     # "2018-03-01"

Node.js

import { OcilarClient } from '@ocilar/sdk'
import { readFileSync } from 'fs'

const client = new OcilarClient({ apiKey: 'sk-your_key' })

const result = await client.extractCsf({
  file: readFileSync('csf.pdf')
})

console.log(result.rfc)
console.log(result.regimenFiscal)
console.log(result.domicilioFiscal)

Extract CURP card

from ocilar import OcilarClient

client = OcilarClient(api_key="sk-your_key")

result = client.extract_curp(file_path="curp.pdf")

print(result.curp)             # "JUAJ850101HMCRRL09"
print(result.nombre_completo)  # "JULIO JUAREZ MARTINEZ"
print(result.fecha_nacimiento) # "1985-01-01"
print(result.entidad)          # "JALISCO"
print(result.sexo)             # "H"

Multi-document onboarding flow

A complete KYC onboarding flow for a Mexican fintech typically requires INE + CSF (for business customers) or INE + CURP (for individuals). Here's a complete example:

from ocilar import OcilarClient
from datetime import datetime

client = OcilarClient(api_key="sk-your_key")

def kyc_onboard_individual(ine_front: str, ine_back: str, curp_pdf: str) -> dict:
    """Full KYC extraction for individual customers."""

    # Extract INE
    ine = client.extract_ine(front=ine_front, back=ine_back)

    # Validate INE not expired
    if int(ine.vigencia) < datetime.now().year:
        raise ValueError(f"INE expired in {ine.vigencia}")

    # Extract CURP to cross-validate
    curp = client.extract_curp(file_path=curp_pdf)

    # Cross-check: CURP on INE should match CURP document
    if ine.curp != curp.curp:
        raise ValueError("CURP mismatch between INE and CURP document")

    return {
        "nombre": ine.nombre,
        "curp": ine.curp,
        "fecha_nacimiento": ine.fecha_nacimiento,
        "domicilio": ine.domicilio,
        "estado": ine.estado,
        "ine_vigencia": ine.vigencia,
        "verified": True
    }

def kyc_onboard_business(ine_front: str, ine_back: str, csf_pdf: str) -> dict:
    """Full KYC extraction for business customers (persona moral/fisica con actividad empresarial)."""

    rep_legal = client.extract_ine(front=ine_front, back=ine_back)
    empresa = client.extract_csf(file_path=csf_pdf)

    return {
        "representante_legal": rep_legal.nombre,
        "rfc_empresa": empresa.rfc,
        "razon_social": empresa.nombre,
        "regimen_fiscal": empresa.regimen_fiscal,
        "domicilio_fiscal": empresa.domicilio_fiscal,
        "actividades": empresa.actividades,
        "verified": True
    }

Industry use cases

Fintech & lending (KYC)

Mexican regulation requires identity verification for any credit product. Extract INE data automatically during the loan application flow — no manual form filling for the borrower, no manual review for your team.

B2B vendor onboarding

Extract and validate supplier RFC and fiscal regime from their CSF before adding them to your accounts payable system. Ensure you're paying the correct RFC and catching mismatches early.

Accounting software & ERPs

Parse CFDIs received from vendors automatically. Extract UUID, amounts, line items, and fiscal data directly into your accounting module without copy-pasting from PDFs.

HR & payroll platforms

Onboard new employees by scanning their INE and CURP. Feed data directly into IMSS and INFONAVIT systems without manual data entry.

Sharing economy & gig platforms

Verify driver, delivery agent, and contractor identities with INE extraction as part of onboarding. Pair with IMSS verification to confirm employment history.

Pricing

Document typePrice/docNotes
INE extraction$0.05–$0.10Front + back count as 1 document
CFDI extraction$0.05XML preferred; PDF also supported
CSF extraction$0.08PDF only
CURP extraction$0.04PDF or image
All document typesFrom $0.020/docVolume plans available

FAQ

What image quality is required for INE?

Minimum 300 DPI equivalent. The card must be fully visible, well-lit, no blur, no glare. Mobile photos work well when the card fills most of the frame.

Does CFDI extraction support CFDI 3.3 and 4.0?

Yes. Both versions are handled automatically — you don't need to specify the version.

Can I extract data from scanned CSF PDFs?

Yes. The CSF extractor handles both digital PDFs (from SAT portal) and scanned copies, though digital PDFs produce higher accuracy.

Is there an SDK for Go or PHP?

Python and Node.js SDKs are available. REST API works with any language. See the full documentation.

Try Ocilar free

1,000 free solves. No credit card required.

Get API Key