In this article, I’ll discuss some aspects of text summarisation, the process of analysing a text document, or a set of documents, in order to produce a summary of its content. The overall purpose is to reduce the amount of information that a user has to digest in order to understand whether reading the whole document is relevant for its information need.
This article is a bird’s-eye view on the topic, to understand the different implications of the problem, rather than a detailed discussion on a specific implementation. The latter will be the subject of future articles.
Summarisation is one of the important tasks in text analytics and it’s an active area of (academic) research which involves mainly the Natural Language Processing and Information Retrieval communities.
Information Overload and the Need for a Good Summary
The core of the matter is the information overload we are experiencing on a daily basis. To put it simply, there is just too much information to digest, and not enough time to do it. The purpose of summarisation is to minimise the amount of information you have to go through, before you can grasp the overall concepts described in the document.
Summarisation can happen in different forms, but the key idea is to present the user with something short, yet informative.
To name just one example, let’s say you want to buy some product, and you’d like to get some opinions about such product. None of your friends owns the product, so you have a look at the on-line reviews: thousands and thousands of sentences to read. Are you going to read all of them? Do you read just some of them and hope to get the best insights?
This is how Google Shopping provides the user with a possible solution:
In this image, the reviews about a popular gaming console are condensed, providing a distribution of ratings and a breakdown of different aspects about the product (e.g. picture/video or battery). The user can then decide to read further, by clicking on a specific aspect, or on a specific rating. Other popular on-line services offer similar
Maybe this is not a big issue when the value of the product is just a few pounds/dollas/euros, but the same problem will arise any time there is just too much to read, and not enough time.
As mentioned in the previous paragraph, every scenario where there is a lot of text to read can be a good application scenario for text summarisation. The following list is paraphrased from a tutorial given at ACL 2001 by Maybury and Many:
- News summaries: what happened while I was away?
- Physicians’ aids: summarise and compare the recommended treatments for this patient
- Meeting summarisation: what happended at the meeting I missed?
- Search engine result pages: snippets of the retrieved documents compared to the query
- Intelligence: create 500-word file of a suspect
- Small screens: create a screen-sized summary of a book/email
- Aids for visually impaired: compact the text and read it out for a blind person
- Sentiment Analysis: give me pros and cons of a product
- Social media: what are the trending topics today?
Text summarisation is not the only way to tackle the information overload in some of these scenarios, but it can play an important role and it can be used as a component of a more complex system that involves e.g. recommendations and search.
Properties of a Summary
Before we can build a summarisation system, we need to understand how the summary is going to be consumed.
There are many different ways to characterise a summary, here we summarise some of them.
Abstract vs. extract: do we rephrase the content or do we extract some of it? The first involves natural language generation, the latter involves e.g. phrase/sentence ranking.
Single vs Multi source: multiple sources can introduce discrepancies, confirming or contradicting some information. Reviews stating opposite opinions and experiences can be legitimate. News releases that contradict each others are problematic. How do we deal with duplicate content? How do we promote novelty?
Generic vs User-oriented: a generic summary is static, created once for all the users. A user-oriented summary is dynamic, tailored to a particular user profile or user session.
Summary Function: do we want to cover all the key points of the source, or just act as a preview? Do we provide an additional critical view on the source? Think about a movie, and compare its plot, its trailer and a review about it: they are all summaries of the movie, but they have different functions.
Summary Length: a summary should be… short. How short? Do we have a target length (number of words/sentences) or a compression rate (e.g. 5% of the source)?
Linguistic qualities: is the summary coherent? Is it fluent? Is it self-contained?
These are just some of the aspects to consider when building a summarisation application. The key question probably is: how is it going to help the user?
Evaluation: How Good is a Summary?
Evaluating a summary is a challenge in itself. The previous paragraph has opened the discussion for a variety of summarisation approaches, so in order to decide how good a summary is, we really need to put some more context.
In principle, there two orthogonal ways to evaluate summarisation: user-vs-system-based and intrinsic-vs-extrinsic. Let’s briefly discuss them.
User-based intrinsic: users are asked to judge the quality of the summary per se. A typical question could be as simple as “how did you like the summary?“, or something more complex regarding the coherence of the summary, or whether it was helpful to understand the full text.
User-based extrinsic: users are asked to complete a particular task. The quality of the summary is measured against how well the user performs on the task. Here, “how well” can involve, for example, accuracy or speed: does the summary improve the user’s performance?
System-based intrinsic: gold standard summaries are produced by human judges, and the system-generated summaries are compared against them. Some evaluation metrics are involved for the automatic generation of a score that allows summarisation systems to be compared. A common example is the ROUGE framework, based on n-gram overlaps.
System-based extrinsic: the system performs some other tasks (e.g. text classification), using the system-generated summary. The system performances are evaluated for these other tasks, with and without the use of the summarisation component.
In general, involving users is a longer and more expensive process, but can provide interesting insights in terms of summary quality. System-based summarisation with e.g. ROUGE can be useful for some initial comparison, when many potential system candidates are available and employing users to judge all of them could be simply not feasible.
Evaluating a summariser against a particular task (extrinsic evaluation) often helps to answer the initial question, how is the summary going to help the user?
To summarise :) there are a few aspects to consider before building a summarisation system.
This article has provided an overall introduction to the field, to highlight some of the key issues to think about.
Some follow-up article will provide more concrete examples with existing tools and actual implementation, to showcase the real use of text summarisation.