VLLM Fine Tuned Model (ADVANCED)
Upwork

Remoto
•1 week ago
•No application
About
I am seeking an expert machine learning engineer to develop a fine tuned visual language model. You will need to be an expert in python, google colab, hugging face, model architecture and dataset structure. This model will be fed architectural drawings and need to output PERCISE (85-90% accuracy) JSON coordinates. The model needs to count objects, measure lengths and areas (just output the co-ordinates, doesn't need to calculate. This will be done post-processing). It needs to be a vision language model, no computer vision e.g. SAM2, grounding dino (models like paligemma 2 are ok). IT NEEDS TO REASON/THINK OVER THE IMAGE INPUTS.



