Welcome to the Amazon ML Challenge 2024 repository! This project showcases solutions and implementations for the machine learning challenge organized by Amazon, focusing on various aspects of image analysis and processing.
This repository contains code and resources for solving the Amazon ML Challenge 2024. The challenge involves predicting specific attributes from images using advanced machine learning models and techniques.
To set up the project locally, follow these steps:
-
Clone the repository:
git clone https://github.com/gjyotin305/AmazonMLChallenge24.git
-
Navigate to the project directory:
cd AmazonMLChallenge24
-
Create a virtual environment and activate it:
python -m venv env source env/bin/activate # On Windows use: env\Scripts\activate
-
Install the required dependencies:
pip install -r requirements.txt
To download the images run:
python student_resource_3/src/download_data.py
To further add the image path to the csv
python preprocess.py
To run the image prediction and processing pipeline:
-
Ensure you have the dataset available at the specified paths or adjust the paths in the script accordingly.
-
Run the main script:
python eval.py
This script will process images and output predictions as defined in the code.
-
For specific tasks or stages, you can modify and execute scripts located in the
scripts
folder.
The project uses images and related data for training and testing. Ensure that you have the dataset in the following directory structure:
../dataset/
: Contains CSV files and other metadata./data/.jyotin/AmazonMLChallenge24/student_resource 3/images_train/
: Contains training images.
The project utilizes the LlavaNextForConditionalGeneration
model from Hugging Face’s Transformers library. The model is fine-tuned for the specific task of extracting and analyzing numerical information from images.
- Model Name:
llava-hf/llava-v1.6-mistral-7b-hf
- Processor:
LlavaNextProcessor
After we get the image description from LlavaNext, we then used a
- Model Name:
Phi3-Mini-4K Instruct
,
to further process the relevant text in a relevant json, which is then checked to ensure proper units and values are only allowed.
We load the model in float16 to ensure better execution speeds.
Our approach utilises only 10 sec
per image.
Results of the predictions and analyses are printed to the console. Modify the script to save results to files or visualize them as needed.
Meet our team members:
- Rhythm Baghel.
- Jyotin Goel.
- Harshiv Shah.
- Mehta Jay Kamalkumar.
This project is licensed under the MIT License - see the LICENSE file for details.