6 Active Experiments

AI Research.

> Building the intelligence layer for creative storytelling — from fine-tuned language models to multimodal generation pipelines.

6

Experiments

3

Models Trained

15K+

Training Pairs

3

Cloud Providers

Research Tracks

Active Areas.

Model Fine-Tuning

[ MDL_FT ]

Training foundation models for domain-specific creative direction — mythology, cinematography, and visual storytelling.

Qwen3-32BLoRAAzure AI FoundryVertex AI

Video Generation

[ VID_GEN ]

Text-to-video and image-to-video pipelines with precise camera control, temporal coherence, and cinematic quality.

MiniMaxHailuoCamera ControlTemporal Consistency

Audio Intelligence

[ AUD_INT ]

Scene-aware audio generation — dialogue, ambience, sound effects, and music that matches visual narrative.

TTSSound DesignScene MatchingSpatial Audio

Multi-Agent Orchestration

[ AGT_ORC ]

Coordinating specialized AI agents to automate full production workflows from script to final render.

Agent PipelinesTool UseProduction DAGsAuto-QA

Visual Understanding

[ VIS_UND ]

Multimodal analysis of images and video — composition, style transfer, reference matching, and aesthetic scoring.

MultimodalStyle AnalysisCompositionAesthetics

Infrastructure & Serving

[ INF_SRV ]

Low-latency model serving, adaptive batching, and cost-efficient GPU orchestration across cloud providers.

GCPAzureModel ServingCost Optimization
Experiment Log

Active Runs.

Qwen3-32B Creative Director

COMPLETE
Model Fine-Tuning|Feb 2026

3 fine-tuning runs across Vertex AI and Azure AI Foundry. Best eval loss 1.017 with 14.8% improvement in epoch 2.

1.017 eval loss15,434 training pairs1.98 PF compute

MiniMax Video Pipeline

IN PROGRESS
Video Generation|Feb 2026

Integrating MiniMax Hailuo API for text-to-video and image-to-video with 50+ camera control presets.

50+ camera moves720p output6s generation

Scene-Aware Audio Matching

IN PROGRESS
Audio Intelligence|Jan 2026

Prototype system that analyzes visual scenes and generates matching ambient audio, dialogue, and sound effects.

Multi-Agent Production Pipeline

IN PROGRESS
Multi-Agent Orchestration|Jan 2026

Designing agent orchestration for end-to-end production: script analysis, shot planning, visual generation, audio layering.

Multimodal Training Dataset v2

PLANNED
Visual Understanding|Q1 2026

Building image+analysis training pairs from 817 GCS assets for multimodal fine-tuning. Quality-filtered with human ratings.

817 images5,200 filtered pairs

Inference Cost Optimization

PLANNED
Infrastructure & Serving|Q1 2026

Benchmarking serving strategies for 32B models: quantization, speculative decoding, and multi-provider routing.

research-timeline.log
# ADIYOGI ARTS — RESEARCH TIMELINE
FEB 2026
Qwen3-32B Run 2 complete — eval ↓14.8% — best loss 1.017
MiniMax video pipeline — 50+ camera presets integrated
Audio scene matching — prototype in testing
JAN 2026
Qwen3-32B Run 0–1 — baseline established on Vertex AI + Azure
15,434 training pairs curated — mythology + cinematography domain
Multi-agent orchestration — architecture design phase
DEC 2025
Midjourney prompt dataset — 18K raw pairs collected
817 reference images transferred to GCS
Research infrastructure — GCP + Azure provisioned
---
NEXT: Run 3 — cosine LR decay, quality-filtered data, multimodal pairs