Multimodal jobs
...(such as DARPA in the US or CERN in Europe), and private corporations like Google, Meta, Huawei, and Samsung. Funding often comes from a mix of public grants, venture capital, and corporate budgets. Key Areas of Breakthrough Technology Research 1. Artificial Intelligence and Machine Learning AI research has exploded since the deep learning revolution of the 2010s. Current frontiers include: Multimodal models that understand text, images, video, and audio simultaneously. AI agents capable of autonomous planning and tool use. Explainable AI (XAI) to make black-box systems more transparent and trustworthy. Energy-efficient AI to address the massive computational demands of training large models. 2. Quantum Computing and Information Science Quantum research aims to harness quantum ...
Full-Stack Engineer Tech stack (must-have) Frontend: React 18, TypeScript (strict), Vite, Mantine 8, CSS Modules, React Router 7 Backend (Node): Node.js, TypeScript, Medplum bots (FHIR vmcontext runtime) Backend (Python): FastAPI, httpx, async Python Data: FHIR R4 (Composition, Encounter, Invoice, Observation, Condition, etc.), PostgreSQL, Redis AI: Google Gemini 2.5 API (Pro multimodal + Flash structured outputs), Anthropic Claude API Infra: Docker, nginx, GitHub Actions Integrations: Stripe, or Jitsi (telemedicine), Stedi,
PROJECT: Refurnish my BTech/MTech research paper to match the exact structure, tone, and detail level of the reference PDF I provide. Topic: Causal Multimodal Diagnostic Agent Combining Chest X-ray Images + Clinical Reports for Thoracic Diseases WHAT I PROVIDE: 1. Reference Paper PDF: You MUST replicate its section headings, table formats, equation style, and writing tone 100%. 2. My Code: , , - ResNet-50 + ClinicalBERT + cross-modal attention + label GNN 3. My Results: with AUROC/F1 per class, , , 4. Dataset Info: MIMIC-CXR subset, 1,495 PA-view image-report pairs, 14 CheXpert labels, patient-level 70/10/20 split. NO full 500GB dataset needed. 5. Target: BTech/MTech final submission + college
Temporal Lesion-Aware Dynamic Gated Multimodal Fusion Framework for DR and DME Analysis Using OLIVES and MMRDR Datasets Framework Overview The proposed framework introduces a Temporal Lesion-Aware Dynamic Gated Multimodal Fusion System for automated analysis of Diabetic Retinopathy (DR) and Diabetic Macular Edema (DME) using multimodal retinal imaging data. The framework combines fundus images, OCT scans, longitudinal retinal information, and optional clinical metadata to improve retinal disease classification, biomarker understanding, and temporal disease progression analysis. Unlike conventional multimodal retinal systems that use static feature fusion, the proposed framework employs a: Dynamic Gated Cross-Modal Fusion Mechanism that adaptively learns the impo...
...photos/content and the AI: Understands the content Selects template Generates: headlines captions hooks layouts Pushes into Canva drafts ready for approval Canva Design System Needed Please also help create: 5 reel cover styles 5 before/after systems 2 educational carousel systems Fixed typography system Fixed colour palette Reusable templates Preferred Skills Canva automation AI workflows Gemini / multimodal AI Prompt engineering No-code automation Luxury brand/social media design. Instagram Reference:
I am putting together a small educational AI/ML project that detects diabetic retinopathy on the publicly-available OLIVES dataset. The goal is not only to build a working multimodal model (fundus images plus any supporting clinical metadata you find useful) but also to showcase clear explainability and rigorous evaluation so the project can be presented in an academic setting. Scope of work • Prepare the OLIVES dataset, handle any class imbalance, and document the preprocessing pipeline. • Design and train a multimodal architecture of your choice in Python—PyTorch, TensorFlow or another modern framework is fine—as long as the code is clean and reproducible. • Produce the quantitative metrics I need: Accuracy, Precision, F1-score, AUC and Coh...
We are building a high-performance team of elite AI annotation and evaluation professionals for advanced AI training and multimodal evaluation projects. This is NOT basic click-task annotation work. We are specifically looking for highly analytical, detail-oriented professionals capable of evaluating complex AI-generated outputs across domains such as: * software engineering * UX/UI and visual design * computer vision * multimodal AI * spreadsheets and documents * presentations and structured data * reasoning and ranking workflows Compensation: • Competitive hourly compensation (high-performing contributors may earn substantial weekly income) • Flexible remote work • Ongoing project opportunities What You Will Do: * Evaluate AI-generated outputs using str...
We are building a high-performance team of elite AI annotation and evaluation professionals for advanced AI training and multimodal evaluation projects. This is NOT basic click-task annotation work. We are specifically looking for highly analytical, detail-oriented professionals capable of evaluating complex AI-generated outputs across domains such as: * software engineering * UX/UI and visual design * computer vision * multimodal AI * spreadsheets and documents * presentations and structured data * reasoning and ranking workflows Compensation: • Competitive hourly compensation (high-performing contributors may earn substantial weekly income) • Flexible remote work • Ongoing project opportunities What You Will Do: * Evaluate AI-generated outputs using str...
...quality, usability, visual polish, structure, and editability of AI-generated outputs across multiple domains including: * software engineering artifacts * UI/UX and design systems * presentation materials * spreadsheets and documents * multimodal and computer vision outputs Compensation: • Approx. $75/hour • Flexible task-based work • High performers may scale to substantial weekly earnings depending on task availability and quality Open Specializations: 1. Software Engineering Evaluators 2. UX/UI & Visual Design Evaluators 3. Computer Vision / Multimodal Evaluators What You Will Do: * Review AI-generated artifacts and compare multiple responses side-by-side * Rank outputs from best to worst using structured evaluation rubrics * Evaluate usability...
...build a lightweight internal web tool (POC) that helps validate migrated CMS pages (AEM-based) against predefined design and content standards. This tool will be used by product and content teams to ensure quality, consistency, and completeness of page migrations. ________________________________________ Project Overview We are migrating insurance product pages from an old template to a new “Multimodal Template.” We already have: • Figma designs (as source of truth) • Sample migrated pages (live URLs) • Initial component mapping (Excel) We want to build a tool that: 1. Takes mapping inputs (components, templates, product) 2. Allows users to input one or multiple page URLs 3. Automatically audits these pages 4. Outputs a structured validation report + sc...
I’m building an NLP-driven, multimodal assistant that accepts text, image, and audio inputs, but its replies still drift into hallucination. The goal is straightforward: sharpen response accuracy so the system stays firmly grounded in fact. Right now the core pipeline is a Hugging Face Transformer model wrapped in a Retrieval-Augmented Generation (RAG) layer. I need you to audit the entire flow, diagnose where and why hallucinations appear, and then apply proven mitigation techniques. That could involve prompt engineering, better retrieval logic, truth-focused data augmentation, fine-tuning, or introducing guard-rail frameworks—whatever combination delivers measurably higher factual precision. Deliverables • A revised model or inference pipeline that demonstrabl...
I want to build a custom large-language model that goes beyond text-only chat. The goal is a predictive engine that can read free-form text, combine it with numerical features, and return forward-looking insights. In practice that means designing an architecture able to embed and fuse both modalities, ...training pipeline with documented source code • Trained model weights and reproducible environment files • Evaluation report demonstrating predictive performance on unseen mixed data • Simple inference script or REST endpoint instructions If you have prior experience blending tabular and textual inputs or have leveraged architectures such as TabTransformer, RETAIN-style attention, or multimodal adapters on top of LLM backbones, mention it—those skills w...