Hello, I note you have open-sourced two implementations of toklip: transfommer_toklip.py and timm_model_toklip.py. During my debugging process, I observed that you utilised TimmModel. I attempted to adapt it to the VisionTransformer implementation, but encountered significant discrepancies between the two codebases. For instance, within TimmModel, the class token is implicitly absent, whereas it is explicitly defined in VisionTransformer. Additionally, there are issues concerning the attention pool. Might you be able to update the code or provide the latest version of transfommer_toklip.py?
Thanks!
Hello, I note you have open-sourced two implementations of toklip: transfommer_toklip.py and timm_model_toklip.py. During my debugging process, I observed that you utilised TimmModel. I attempted to adapt it to the VisionTransformer implementation, but encountered significant discrepancies between the two codebases. For instance, within TimmModel, the class token is implicitly absent, whereas it is explicitly defined in VisionTransformer. Additionally, there are issues concerning the attention pool. Might you be able to update the code or provide the latest version of transfommer_toklip.py?
Thanks!