This project contains a Flask-based web application that integrates various Natural Language Processing (NLP) models to generate text summaries. The app allows users to enter text and receive summaries in different formats such as paragraph or bullet points, utilizing state-of-the-art models for summarization tasks.
Table of Contents
- Project Overview
- Installation
- Usage
- Model Training
- API Endpoints
- Customization
- File Structure
- Contributing
- License
Project Overview
This repository offers a web-based interface for summarizing text using different models including BART, Llama 2, Ollama Llama 3, and PEFT-based fine-tuning techniques like LoRA (Low-Rank Adaptation). The app is designed for flexibility, supporting multiple models and summarization formats.
Key Features:
- Supports BART, Llama 2, and Ollama Llama 3 models.
- Users can choose between paragraph or bullet points format for the summaries.
- Adjustable summary lengths (short or long).
- Utilizes LoRA for efficient fine-tuning with PEFT.
Installation
Prerequisites
- Python 3.8 or higher
- Pip package manager
- CUDA-enabled GPU (for training models)
Steps
- Clone the repository:
git clone https://github.com/shahinur-alam/AI-Powered-Content-Summarizer.git
cd AI-Powered-Content-Summarizer
- Create a virtual environment (optional but recommended):
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Run the application:
python app.py
Usage
Running the App
- Start the application by running
python app.py
. - Open your browser and go to
http://127.0.0.1:5000/
. - Input the text you want to summarize, select the summary type (short or long), and the format (paragraph or bullet points).
- Submit the form to get the summary generated by the model.
Example Input
Original Text:
The BART model is widely known for its text generation capabilities, and is particularly effective at summarization tasks.
Example Output (Short Summary in Bullet Points Format)
* BART is known for text generation.
* It is effective at summarization tasks.
Model Training
This application includes both pre-trained models and fine-tuning capabilities.
BART Summarization
BART is used to generate summaries directly without the need for additional training:
from transformers import BartForConditionalGeneration, BartTokenizer
Llama 2 with PEFT and LoRA
The Llama 2 model is fine-tuned using LoRA to enable efficient fine-tuning:
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
task_type="CAUSAL_LM"
)
This allows the app to fine-tune the model efficiently on smaller datasets.
Ollama Llama 3
For Ollama Llama 3, a Langchain implementation is used to summarize text:
llm = Ollama(model="llama3")
Fine-Tuning and Training
- The Llama 2 model can be fine-tuned using Hugging Face’s
Trainer
API, with training arguments and LoRA applied for parameter-efficient tuning. - The fine-tuned model can then be saved for future inference:
trainer.save_model("./llama2-finetuned-final")
API Endpoints
GET /
- Description: Displays the main page where users can input text for summarization.
- Response: Renders the
index.html
page containing a form.
POST /
- Description: Handles form submissions and returns the generated summary.
- Parameters:
text
: The text to summarize.summary_type
: The desired summary type (short
orlong
).format_type
: The desired format (paragraph
orbullet_points
).- Response: Displays the summarized text.
Customization
Summary Length
You can adjust the maximum and minimum summary length in the code:
max_length = 150 if summary_type == 'short' else 300
min_length = 50 if summary_type == 'short' else 100
Models
- To change the model, adjust the initialization in
app.py
. For example, to switch between BART and Llama 2, update the respective model import and initialization.
Output Format
Summaries can be returned in either paragraph or bullet points format. This is controlled by user input through the form and processed within the summarize_text
function.
File Structure
AI-Powered-Content-Summarizer/
│
├── templates/
│ └── index.html # HTML template for the web interface
├── app.py # Main Flask application logic
├── requirements.txt # Python dependencies
└── README.md # Project documentation
Contributing
We welcome contributions to improve the project! Here’s how you can get started:
- Fork the repository.
- Create a new feature branch (
git checkout -b feature-branch
). - Commit your changes (
git commit -m 'Add new feature'
). - Push to the branch (
git push origin feature-branch
). - Open a pull request.
License
This project is licensed under the MIT License. You are free to use, modify, and distribute it.