This project contains a Flask-based web application that integrates various Natural Language Processing (NLP) models to generate text summaries. The app allows users to enter text and receive summaries in different formats such as paragraph or bullet points, utilizing state-of-the-art models for summarization tasks.

Table of Contents

  1. Project Overview
  2. Installation
  3. Usage
  4. Model Training
  5. API Endpoints
  6. Customization
  7. File Structure
  8. Contributing
  9. License

Project Overview

This repository offers a web-based interface for summarizing text using different models including BART, Llama 2, Ollama Llama 3, and PEFT-based fine-tuning techniques like LoRA (Low-Rank Adaptation). The app is designed for flexibility, supporting multiple models and summarization formats.

Key Features:

  • Supports BART, Llama 2, and Ollama Llama 3 models.
  • Users can choose between paragraph or bullet points format for the summaries.
  • Adjustable summary lengths (short or long).
  • Utilizes LoRA for efficient fine-tuning with PEFT.

Installation

Prerequisites

  • Python 3.8 or higher
  • Pip package manager
  • CUDA-enabled GPU (for training models)

Steps

  1. Clone the repository:
   git clone https://github.com/shahinur-alam/AI-Powered-Content-Summarizer.git
   cd AI-Powered-Content-Summarizer
  1. Create a virtual environment (optional but recommended):
   python -m venv venv
   source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
   pip install -r requirements.txt
  1. Run the application:
   python app.py

Usage

Running the App

  1. Start the application by running python app.py.
  2. Open your browser and go to http://127.0.0.1:5000/.
  3. Input the text you want to summarize, select the summary type (short or long), and the format (paragraph or bullet points).
  4. Submit the form to get the summary generated by the model.

Example Input

Original Text: 
The BART model is widely known for its text generation capabilities, and is particularly effective at summarization tasks.

Example Output (Short Summary in Bullet Points Format)

* BART is known for text generation.
* It is effective at summarization tasks.

Model Training

This application includes both pre-trained models and fine-tuning capabilities.

BART Summarization

BART is used to generate summaries directly without the need for additional training:

from transformers import BartForConditionalGeneration, BartTokenizer

Llama 2 with PEFT and LoRA

The Llama 2 model is fine-tuned using LoRA to enable efficient fine-tuning:

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    task_type="CAUSAL_LM"
)

This allows the app to fine-tune the model efficiently on smaller datasets.

Ollama Llama 3

For Ollama Llama 3, a Langchain implementation is used to summarize text:

llm = Ollama(model="llama3")

Fine-Tuning and Training

  • The Llama 2 model can be fine-tuned using Hugging Face’s Trainer API, with training arguments and LoRA applied for parameter-efficient tuning.
  • The fine-tuned model can then be saved for future inference:
  trainer.save_model("./llama2-finetuned-final")

API Endpoints

GET /

  • Description: Displays the main page where users can input text for summarization.
  • Response: Renders the index.html page containing a form.

POST /

  • Description: Handles form submissions and returns the generated summary.
  • Parameters:
  • text: The text to summarize.
  • summary_type: The desired summary type (short or long).
  • format_type: The desired format (paragraph or bullet_points).
  • Response: Displays the summarized text.

Customization

Summary Length

You can adjust the maximum and minimum summary length in the code:

max_length = 150 if summary_type == 'short' else 300
min_length = 50 if summary_type == 'short' else 100

Models

  • To change the model, adjust the initialization in app.py. For example, to switch between BART and Llama 2, update the respective model import and initialization.

Output Format

Summaries can be returned in either paragraph or bullet points format. This is controlled by user input through the form and processed within the summarize_text function.


File Structure

AI-Powered-Content-Summarizer/
│
├── templates/
│   └── index.html            # HTML template for the web interface
├── app.py                    # Main Flask application logic
├── requirements.txt          # Python dependencies
└── README.md                 # Project documentation

Contributing

We welcome contributions to improve the project! Here’s how you can get started:

  1. Fork the repository.
  2. Create a new feature branch (git checkout -b feature-branch).
  3. Commit your changes (git commit -m 'Add new feature').
  4. Push to the branch (git push origin feature-branch).
  5. Open a pull request.

License

This project is licensed under the MIT License. You are free to use, modify, and distribute it.

0 0 votes
Article Rating
0
Would love your thoughts, please comment.x
()
x