| title | Email Classifier |
|---|---|
| emoji | 📚 |
| colorFrom | gray |
| colorTo | purple |
| sdk | docker |
| pinned | false |
| short_description | Classifier for email using BERT |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
This project is an email classification application built using a BERT-based model. It classifies emails into predefined categories, enabling efficient email management and automation.
The email classifier leverages a pre-trained BERT (Bidirectional Encoder Representations from Transformers) model fine-tuned for text classification tasks. BERT is a state-of-the-art natural language processing model that understands the context of words in a sentence, making it highly effective for tasks like email classification.
- Pre-trained BERT Model: Utilizes a pre-trained BERT model from Hugging Face for transfer learning.
- Fine-tuning: The model is fine-tuned on a labeled dataset of emails to adapt it to the specific classification task.
- Tokenization: Emails are tokenized using the BERT tokenizer to convert text into input IDs and attention masks.
- Multi-class Classification: The model predicts the probability of each email belonging to one of the predefined categories.
-
Data Preprocessing:
- Emails are cleaned to remove unnecessary characters, HTML tags, and stopwords.
- Tokenization is performed using the BERT tokenizer to prepare the input for the model.
-
Model Training:
- A pre-trained BERT model is fine-tuned on the labeled dataset.
- The training process optimizes the model's weights using a cross-entropy loss function.
-
Evaluation:
- The model is evaluated on a validation dataset using metrics like accuracy, precision, recall, and F1-score.
- Hyperparameters such as learning rate and batch size are tuned for optimal performance.
-
Deployment:
- The trained model is integrated into a FastAPI application for serving predictions.
- Docker is used to containerize the application for easy deployment.
app.py: The main FastAPI application that serves the email classification API.bert-email-classifier/: Contains the BERT model configuration, tokenizer, and vocabulary files.Dockerfile: Configuration for building the Docker image.requirements.txt: List of Python dependencies required for the project.data/: Directory for storing training and validation datasets (not included in the repository).
- Docker installed on your system
- Python 3.9 or later (if running locally without Docker)
pipfor managing Python dependencies
- Build the Docker image:
docker build -t email-classifier .