-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added pseudolabel_frames.py #19
base: main
Are you sure you want to change the base?
Conversation
Moreover, crucially, you save the result as |
Hey Yahya, yes thanks for making the PR and including the pictures! :) Before merging in, +1 to Markus' comments and suggestions. To consolidate our suggestions together, can you make the following changes?
Thanks! And let me know if you're unsure about how to tackle any of these :) |
fourm/pseudolabel_frames.py
Outdated
import cv2 | ||
from ultralytics import YOLO | ||
|
||
SHARDS = "/cluster/work/cotterell/mm_swissai/datasets/hdvila/1000_hd_vila_shuffled/0000000000.tar" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a configurable input
fourm/pseudolabel_frames.py
Outdated
from ultralytics import YOLO | ||
|
||
SHARDS = "/cluster/work/cotterell/mm_swissai/datasets/hdvila/1000_hd_vila_shuffled/0000000000.tar" | ||
OUTPUT_DIR = "bbox-yolo/extracted_frames" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be root/data/video_det
fourm/pseudolabel_frames.py
Outdated
video.release() | ||
|
||
# Apply pseudolabeling to the extracted frames | ||
results = model(frame_paths, project=LABELED_OUTPUT_DIR, name=file[:-4]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you extract the bounding box representations as JSONs?
Made the changes, so now the shards is an argument, (the paths to save will be changed on todi, I kept them for now as they are on euler because the YOLO model is a bit weird with where it puts the save directory) Also did the nth frame selection and also saves both the bounding box image and the json output like this: [ |
Great! Thanks! It is also tested on Euler, right? I realize you extract the tarfile to a directory. I am not sure if we want this. Maybe it is better not to keep them and just use a temp dir/temp file instead. I am also doing so in tokenization #14. Another thing we need to keep in mind is if we have different fps for different videos. In this case, should we normalize the |
fourm/pseudolabel_frames.py
Outdated
|
||
# Load the YOLO model | ||
model = YOLO('bbox-yolo/yolov8n.pt') # pretrained YOLOv8n model | ||
model = YOLO('/cluster/work/cotterell/yemara/ml-4m/bbox-yolo/yolov8n.pt') # pretrained YOLOv8n model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super-nit: for more configuration, maybe even make this path an arg in the parseargs (and set this is as the default)?
fourm/pseudolabel_frames.py
Outdated
|
||
# Extract the tar file | ||
with tarfile.open(SHARDS, "r") as tar: | ||
tar.extractall(path="bbox-yolo/extracted_files") | ||
tar.extractall(path="extracted_files") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to Markus's comment -- better perhaps to do everything within a tempdir like this https://stackoverflow.com/questions/3223604/how-do-i-create-a-temporary-directory-in-python
fourm/pseudolabel_frames.py
Outdated
conf = box.conf.item() # get confidence score | ||
cls = int(box.cls.item()) # get class id | ||
json_data.append({ | ||
"bbox": xyxy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use key-names consistent with the og 4M's bounding box json key names? (I'm 90% sure it's the ones in the example here - #11 (comment))
So for one frame that'd be like:
{
"num_instances": 5,
"image_height": 512,
"image_width": 906,
"instances": [
{
"boxes": [
0.4229210317134857,
0.00020096010121051222,
0.5715101361274719,
0.13699540495872498
],
"score": 0.9029952883720398,
"class_id": 74,
"class_name": "clock",
"segmentation": [
[
0.5055187637969095,
0.1337890625,
...
]
]
},
{
"boxes": [
...
],
...
},
...
]
},
fourm/pseudolabel_frames.py
Outdated
video.release() | ||
|
||
# Apply pseudolabeling to the extracted frames | ||
results = model(frame_paths, project=LABELED_OUTPUT_DIR, name=file[:-4]) | ||
|
||
for i, result in enumerate(results): | ||
# Save labeled image | ||
result.save(filename=f'{file[:-4]}_labeled_frame_{i}.jpg') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you actually make this optional in an arg (default false)? While this is useful for debugging, when we run at scale I think we won't want to save every image.
fourm/pseudolabel_frames.py
Outdated
}) | ||
|
||
# Save JSON file | ||
json_filename = os.path.join(JSON_OUTPUT_DIR, f"{file[:-4]}_frame_{i}_boxes.json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we actually do 2 more things for saving the results?
- aggregate the results in a list of jsons and save it using jsonl? (https://jsonlines.readthedocs.io/en/latest/) So, each video should be saved with the same name as the mp4 (so if a file is named 00004.mp4, it should be saved as 00004.jsonl).
- repackage the jsonl's back into the tar files corresponding in name to the tarfiles containing those mp4s. For example, if all videos extracted from the tar video_rgb/00043.tar, should have its corresponding jsonls in video_det/00043.tar. See Transform from video_rgb format into video_det format and save in
video_det/
directory. #11 (comment) for more deets!
Hey Yahya, looks much better! Just have a few more requests regarding how the outputs of yolo should be saved that I left in-line, can you take a look please? |
This is a good point. I think we have a couple of options here? (1) we keep the same FPS for all videos for a given modality. (2) we mark the FPS of the video for each modality in metadata. I'd be a proponent of (2) but also would like to get a take from Ali/other 4M experts. |
Thx for coming up with the options. (1) may be enough for a start but could restrict us later, so it may be better indeed to implement this more flexibly already. So I agree that (2) is better. |
Another TODO: instead of doing every-nth-frame, keep a fixed FPS for each modality. While less flexible, it's easier to implement and it doesn't seem necessary now to engineer for differing FPS for different videos. The cost of changing this later is just needing to re-pseudolabel everything, but while we work with small amounts of data to start this shouldn't be a concern. Also do the same for #14 |
@yahya010 thanks, please also move to fps instead of every_nth_frame (see comment above) |
…it take in a dir of tars, move things into tempdirs
004794a
to
dab7df1
Compare
Pseudolableing code:
Goes through each tar file, checks mp4 files and for each frame in the video, psuedolabels it.
Here are is an example of a frame before and after pseudolabeling:
Before:
After Pseudolabeling the frame: