|
Software Patent Abstract
A software tool for creating, training and testing a knowledge base
of a computerized customer relationship management system is disclosed.
The software tool includes corpus editing processes for displaying
and editing text-based corpus items, and assigning selected categories
to individual corpus items. Knowledge base construction processes
construct a knowledge base by analyzing a first subset of the corpus
items, and testing processes test the knowledge base on a second
subset of the corpus items. Reporting processes generate reports
containing indicia representative of the testing results, which
may be utilized to edit the corpus items and retrain the knowledge
base so as to improve performance.
Software Patent Claims
16. A computer-implemented method for training and testing a knowledge
base of a computerized customer relationship management system,
comprising: collecting corpus items into a corpus; assigning a category
from a set of predefined categories to individual corpus items;
building a knowledge base of a computerized customer relationship
management system by performing natural language and semantic analysis
of a first subset of corpus items; testing the knowledge base of
the computerized customer relationship management system on a second
subset of corpus items by classifying each corpus item of the second
subset into at least one of the predefined categories using information
contained in the knowledge base of the computerized customer relationship
management system; and generating and displaying a report based
on results produced by the testing step to a user of the computerized
customer relationship management system to gauge performance of
the knowledge base, so that appropriate adjustments are made to
improve the performance of the knowledge base.
17. The method of claim 16, wherein the step of testing the knowledge
base includes calculating a set of scores for each corpus item in
the second subset, each score from the calculated set of scores
being associated with a corresponding category and being representative
of a confidence that the corpus item belongs to the corresponding
category.
18. The method of claim 16, wherein the step of generating and
displaying a report includes generating a report relating to a single
selected category.
19. The method of claim 16, wherein the step of generating and
displaying a report includes generating a cumulative report relating
to a plurality of categories.
20. The method of claim 16, wherein the step of generating and
displaying a report includes: receiving user input specifying one
of a precision value, a recall value, false positive rate, false
negative rate, automation ratio or a cost ratio; and calculating
and displaying, for a selected category, a match score based on
the user input.
21. The method of claim 16, wherein the step of generating and
displaying a report includes: receiving user input specifying a
match score; and calculating and displaying, for a selected category,
a precision value and a recall value based on the user input.
22. The method of claim 16, wherein the step of generating and
displaying a report includes calculating precision as a function
of recall and causing a graph to be displayed depicting the relationship
between precision and recall.
23. The method of claim 16, wherein the step of generating and
displaying a report includes generating and displaying a graph depicting
cumulative success over time, the graph showing, for a plurality
of groups of corpus items each having a common time parameter, the
fraction of corpus items in the group that were appropriately classified.
24. The method of claim 16, wherein the step of generating and
displaying a report includes generating and displaying a report
showing, for each of a plurality of pairs of categories, a percentage
of corpus items initially assigned to a first category of the pair
of categories that were erroneously classified into a second category
of the pair of categories.
25. The method of claim 16, wherein the step of generating and
displaying a report includes generating and displaying a scoring
report showing, for a selected category, match scores for each corpus
item in the second subset, the match scores being representative
of the relevance of the selected category to the corpus item.
26. The method of claim 16, wherein the first and second subsets
of corpus items are selected in accordance with user input.
27. The method of claim 16, wherein the steps of use building and
testing the knowledge base include using a modeling engine to analyze
and classify corpus items.
28. The method of claim 16, wherein the step of generating and
displaying a report includes selecting a report from a plurality
of available reports in response to user input.
29. The method of claim 16, wherein the corpus items comprise customer
communications.
30. The method of claim 16, wherein the corpus items include structured
and unstructured information.
31. (canceled)
32. A computer-readable medium embodying instructions executable
by a computer for performing the steps of: collecting corpus items
into a corpus; assigning a category from a set of predefined categories
to individual corpus items; building a knowledge base of a computerized
customer relationship management system by performing natural language
and semantic analysis of a first subset of corpus items; testing
the knowledge base of a computerized customer relationship management
system on a second subset of corpus items by classifying each corpus
item of the second subset into at least one of the predefined categories
using information contained in the knowledge base of a computerized
customer relationship management system; and generating and displaying,
on a computer, a report based on results produced by the testing
step to a user of the computerized customer relationship management
system to gauge performance of the knowledge base, so that appropriate
adjustments are made to improve the performance of the knowledge
base.
Software Patent Description
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This application claims the benefit of U.S. Provisional
Application No. 60/468,493, filed May 6, 2003. The disclosure of
the foregoing application is incorporated herein by reference.
BACKGROUND OF THE INVENTION
[0002] 1. Field of the Invention
[0003] The present invention relates generally to computer software,
and more particularly to relationship management software for classifying
and responding to customer communications.
[0004] 2. Description of the Prior Art
[0005] Most commercial enterprises devote significant time and
resources to the tasks of reviewing and appropriately responding
to inquiries, requests and other text-based electronic communications
received from current or prospective customers. In order to enable
more efficient administration of these tasks, certain software vendors,
such as iPhrase Technologies, Inc. of Cambridge, Mass., have developed
computerized customer relationship management (CRM) systems which
perform analysis of incoming electronic communications and classify
the communications into predetermined categories based on the determined
intent. This categorization process may be utilized to automate
generation of responses, or to guide human agents in the selection
of a suitable response.
[0006] Such CRM systems typically require construction of a knowledge
base (KB) before the analysis and classification functions may be
performed reliably, i.e., before the CRM system may be put on-line.
The KB contains relevant statistical and semantic information derived
from a body of sample texts (known collectively as a corpus) by
using a process known as training. KB performance may be improved
by periodically retraining the KB with additional texts, or by providing
the KB with online feedback (a process referred to as online learning,
an example of which is described in U.S. patent application Ser.
No. 09/754,179, filed Jan. 3, 2001). Generally, the accuracy and
reliability of a CRM system depend on optimizing and maintaining
KB performance. Poor KB performance may result in unacceptably high
rates of false positives (i.e., frequently assigning non-relevant
categories to communications) and/or false negatives (i.e., frequently
failing to assign a relevant category to communications).
[0007] To construct and train a KB that provides satisfactory performance,
the CRM user must carefully perform a number of preparatory tasks,
including collecting appropriate sample texts, identifying a set
of categories that classify the texts according to intent, and assigning
the proper category to each sample text. If this process is conducted
improperly or if erroneous information is used, then the performance
of the resultant KB will be compromised, and the associated CRM
system will behave in an unreliable fashion. Unfortunately, the
prior art lacks tools for testing the performance of a KB and for
reporting the test results in a manner which would allow the user
to identify and remedy errors and problematic conditions in order
to improve KB performance.
SUMMARY
[0008] Roughly described, an embodiment of the present invention
provides a software tool for training and testing a knowledge base
of a computerized customer relationship management system. The software
tool may be conceptually divided into four component processes:
corpus editing processes, knowledge base (KB) building processes,
KB testing processes, and reporting processes. The corpus editing
processes import selected sample texts, allow assignment of relevant
categories from a predefined category list to individual corpus
items, display corpus items and associated field and category information
for user inspection, and modify the corpus items and associated
information in accordance with user input. KB building processes
select a subset of the corpus items to be used for training in response
to user input, and cause a KB to be constructed based on analysis
of the texts in the training subset. KB building processes may use
the services of a modeling engine to perform the requisite text
processing and semantic and statistical analysis operations. Once
the KB has been built, KB testing processes test the performance
of the KB by using it to classify each corpus item of in a second
subset. Reporting processes then generate selected reports representative
of the performance of the KB, and cause the reports to be displayed
to the user. The reports may identify errors or problematic conditions
to the user, which may be remedied by making appropriate changes
to corpus items and/or organization of the KB.
[0009] Reports which may be generated by the reporting processes
and viewed by the user include reports representative of overall
KB performance across all categories, and reports representative
of KB performance for a selected category. Illustrative examples
of reports which may be selected include scoring graph reports,
showing match scores in a selected category for each corpus item
in the testing subset; reports showing the relationship between
precision and recall, either for all categories or for a selected
category; cumulative success over time reports, showing how the
KB performance changes over time; threshold calculator reports,
depicting the relationship between values of threshold, cost ratio,
precision and recall and allowing the user to rationally set threshold
values to be used by an application; and, stealing/stolen reports,
showing the percentage and number of corpus items "stolen"
by or from one category of a pair of categories, which may be used
to identify categories having overlapping intents.
BRIEF DESCRIPTION OF THE FIGURES
[0010] In the attached drawings:
[0011] FIG. 1 is a block diagram depicting the knowledge base (KB)
tool of the invention in relation to an exemplary computerized customer
relationship management (CRM) system;
[0012] FIG. 2 is a block diagram depicting components of the KB
tool;
[0013] FIG. 3 is a workflow diagram depicting the steps of a process
for training and testing the KB;
[0014] FIG. 4 is an exemplary user interface (UI) screen of the
KB tool used for displaying and editing corpus items;
[0015] FIG. 5 is a block diagram depicting the division of the
corpus items into training and testing subsets;
[0016] FIG. 6 is an exemplary UI screen of the KB tool presenting
a set of user-selectable options for dividing the corpus into training
and testing subsets;
[0017] FIG. 7 is an exemplary scoring graph report;
[0018] FIG. 8 is an exemplary report of total precision versus
recall;
[0019] FIG. 9 is an exemplary cumulative success over time report;
[0020] FIG. 10 is an exemplary threshold calculator report; and
[0021] FIG. 11 is an exemplary stealing/stolen report.
DETAILED DESCRIPTION
[0022] The invention may be more easily understood with reference
to the attached figures, which depict various aspects of an embodiment
of a software tool for training and testing a knowledge base of
a computerized customer relationship management system. Referring
initially to FIG. 1, there is shown a software tool (hereinafter
referred to as the "KB tool") 100, which provides a user
with the ability to train and test a knowledge base (hereinafter
referred to as "KB") of a computerized customer relationship
management ("CRM") system 102. CRM system 102 may be logically
and conceptually divided into three components: an application 104,
a modeling engine 106, and a KB 108. Application 104, which may
be configured to perform any variety of functions, receives text-based
electronic communications from an external source. The communications
will typically take the form of electronic mail messages (e-mails),
or text supplied through a web interface (e.g., in a query box of
an HTML form). Application 104 calls upon the services of modeling
engine 106 to analyze the communication and to determine an associated
intent. As will be discussed in further detail below, modeling engine
106 may determine intent by calculating a set of match scores for
each communication, wherein individual match scores of the match
score set correspond to one of a plurality of pre-established categories.
The match score is representative of a confidence that the communication
"belongs to" the associated category; a high match score
for a category is indicative of a high probability that the communication
is relevant to that category, whereas a low match score indicates
a low probability of relevance. Modeling engine 106 uses KB 108
to perform the analysis and scoring functions, as will be described
below.
[0023] Match scores calculated by modeling engine 106 are returned
to application 104, which may select and take an appropriate action
based on the match scores. In one example, application 104 takes
the form of an automated e-mail response application, which receives
inquiries and requests from current or prospective customers. Depending
on match score values determined by the modeling engine, application
106 may select and send an appropriate response to the inquiry or
route the inquiry to an appropriate agent 110 for further action.
As an illustrative example, modeling engine 106 may analyze an e-mail
received from a prospective customer and calculate a high match
score for a category associated with a specific product or service
offered by a company. The e-mail response application could then
automatically send the prospective customer a response with information
about the specific product/service, or route the customer e-mail
to a human agent having the relevant expertise.
[0024] Those skilled in the art will recognize that application
104, modeling engine 106 and KB 108, as well as KB tool 100, may
reside and be executed on a single computer, or on two or more computers
connected over a network. The computer or computers on which the
components reside will typically be equipped with a monitor and/or
other display device, as well as a mouse, keyboard and/or other
input device such that the user may view UI screens and reports
and enter user input. Those skilled in the art will also recognize
that the foregoing software components will typically be implemented
as sets of instructions executable by a general-purpose microprocessor.
In a specific implementation of CRM system 102, modeling engine
106 uses a two-phase process to analyze and classify received communications.
In the first phase, a natural-language processing (NLP) engine extracts
concepts from the communication and generates a structured document
containing these concepts. As used herein, the term "concept"
denotes any feature which may be used to characterize a specific
category and distinguish it from other categories, including words
or phrases as well as information representative of the source or
context of the communication (e.g., an e-mail address). The NLP
engine extracts the concepts by performing a prescribed sequence
of operations, which may include language identification and encoding
conversions, tokenization, text cleanup, spelling and grammatical
error correction, and morphological and linguistic analysis.
[0025] According to the two-phase implementation of modeling engine
106, the structured document generated by the NLP engine and containing
the extracted concepts is passed to a semantic modeling engine,
which performs statistical pattern matching on the document by comparing
it with the content of categories residing in KB 108 to produce
the match score set. As noted above, each score in the match score
set represents a confidence level that the communication falls within
the associated category. KB 108 may also include one or more user-supplied
rules specifying how to route communications to specific categories
based on the content of the communication or related metadata (indicating,
for example, the identity of the person sending the communication,
or properties of the channel over which the communication was received,
e.g., secured or unsecured).
[0026] Software utilizing a two-phase modeling engine of the foregoing
general description is commercially available from iPhrase Technologies,
Inc. It is noted, however, that the description of a specific implementation
of modeling engine 106 is provided by way of an example, and the
invention should not be construed as being limited thereto.
[0027] KB 108 may be regarded as an object containing the learned
information required by modeling engine 106 to perform the match
score generation function, and may take any suitable form, including
a database or file (or collection of files). KB 108 contains relevant
statistical and semantic information derived from a collection of
sample texts known as a corpus. The process of deriving the relevant
statistical and semantic information from the corpus is known as
"training." The performance of KB 108 may be maintained
and improved over time by providing it (either in real-time or at
specified intervals) with feedback and adjusting information contained
within KB 108 accordingly, a process known as "learning."
In one example of feedback, application 104 may execute an "auto-suggest"
function, wherein it identifies to a human agent two or more categories
(or a set of candidate responses each of which is associated with
one of the categories) most likely to be relevant to the received
communication. When the agent selects one (or none) of the identified
categories or associated responses, feedback is provided to KB 108,
and statistics contained within KB 108 are appropriately modified
to reflect the selection. The process of adapting a knowledge base
using feedback is described in greater detail in co-pending U.S.
patent application Ser. No. 09/754,179, filed Jan. 3, 2001, which
is incorporated by reference.
[0028] In an exemplary implementation, KB 108 may be organized
into an array of nodes, wherein each node contains semantic statistical
information and/or rules for use by modeling engine 106 in classifying
communications. Some or all of the nodes will represent individual
categories. The simplest way to organize nodes in KB 108 is to place
them in a single-level flat knowledge base structure. If, for example,
CRM system 102 is designed to analyze customer e-mails and determine
to which product each e-mail pertains, KB 108 may take the form
of a flat knowledge base of several nodes, each node representing
a product and containing the relevant semantic and statistical information.
Alternatively, the nodes may be organized into a multi-level hierarchical
structure, wherein certain of the nodes have child nodes, or into
other structures known in the art.
[0029] KB tool 100 advantageously provides means for constructing
and training KB 108, for assessing its performance, and for identifying
various errors and problematic conditions. Referring now to FIG.
2, it is seen that KB tool 100 may be conceptually divided into
four composite sets of processes: corpus editing processes 202,
KB building processes 204, KB testing processes 206, and reporting
processes 208. Generally described, corpus editing processes 202
import selected sample texts into a corpus, display corpus items
and associated field and category information for user inspection,
and modify the corpus items and associated information in accordance
with user input; KB building processes 204 select a subset of the
corpus items to be used for training in response to user input,
and cause a KB to be constructed based on analysis and classification
of text and metadata contained in the selected corpus items; KB
testing processes 206 test the KB using a second subset of the corpus
items; and, reporting processes 208 generate reports on the testing
and cause the reports to be displayed to the user. It should be
recognized that the partition of KB tool 100 into separate processes
is conceptual in nature and should not be construed as specifying
the actual program architecture of KB tool 100, i.e., as requiring
that each set of processes reside in an independent module.
[0030] The functions performed by each of the processes, and by
KB tool 100 as a whole, may be more clearly explained with reference
to FIG. 3, which depicts the workflow associated with training and
testing KB 108, and to FIGS. 4-11, which depict exemplary UI screens
and reports that are displayed to the user and employed to implement
the various functions of KB tool 100. Referring initially to FIG.
3 and proceeding from left to right, the operations of training
and testing KB 108 begins with the creation and editing of the corpus
file, which is managed by corpus editing processes 202. To create
the corpus file, the user identifies (typically through a dialog
box or other UI element) a source or sources of the sample texts
that will be used for training and testing. The sample texts should
be of the same type as and representative of the communications
that will be analyzed and classified by CRM system 102. For example,
if CRM system 102 is configured to act as an automated e-mail response
application that automatically provides or suggests appropriate
pre-prepared text responses to incoming e-mails, then the sample
texts should be typical e-mail messages containing questions that
are similar to those which will be received by CRM system 102. Performance
of KB 108 will be improved by creating a corpus file containing
a relatively large number of sample texts. Furthermore, it is beneficial
to create a corpus file that contains a significant number of sample
texts pertinent to each of the categories into which the communications
will be classified. Files of various formats and types may serve
as the source of the sample texts, including without limitation,
comma separated value (CSV) files, Microsoft Excel (worksheet) files,
and PST (Microsoft Outlook e-mail) files. In addition, the corpus
file may be manually constructed (or modified) by entering or copying
individual corpus items via a user interface.
[0031] Creation and editing of the corpus also involves defining
corpus fields (also referred to as name-value pairs, or NVPs) and
assigning a category to each corpus item. Corpus fields are data
sets containing information associated with each corpus item. Definition
of corpus fields allows the user to specify which elements of the
corpus items (and of communications to be acted upon by CRM system
102) will be analyzed by modeling engine 106. For example, if the
corpus items are e-mail messages, appropriate corpus fields may
include a "From" field identifying the source of the corpus
item, a "Message" field containing the message body, a
"Subject" field containing the message subject, and a
"Category" field identifying the category to which the
corpus item belongs. Each corpus field may be assigned properties
specifying the data type contained in the field (e.g., text or number)
as well as options for how the field is processed (or not processed)
by the NLP engine of modeling engine 108. These properties will
typically be assigned via a dialog box or similar UI element. Each
corpus item may include either or both unstructured and/or structured
information. Structured information consists of information having
certain predetermined constraints on its values and/or format, such
as a corpus field which can only take a value of TRUE or FALSE.
Unstructured information, such as a free language field (for example,
the "Message" field described above) does not need to
conform to prescribed restraints.
[0032] Corpus field names and properties may be specified by the
user through a dialog box or other UI element. Alternatively, the
corpus field names and properties may be specified in the sample
text files themselves. In another alternative, corpus editing processes
202 may automatically define corpus fields and properties if the
sample text file is in a certain prescribed format, such as a PST
file containing e-mail messages.
[0033] Corpus editing processes 202 also manage the assignment
of categories to each corpus item. The categories are representative
of distinct groupings into which the communications may be classified
according to the communications' intents. Typically, identification
of categories is performed by manually reviewing a set of sample
texts to determine what common intents are expressed in the texts.
In one example, CRM system 102 is an automated e-mail response application
for a product retailer. The user, upon review of a sample of recently
received emails, finds that the e-mails may be classified into one
of three areas: requests for product specifications and pricing
information, complaints about purchased products, and inquiries
regarding store locations and hours of operation. The user may then
specify, using a dialog box or other UI element presented by the
corpus editing processes 202 to the user, that three categories
are to be used by KB 108 for classification, consisting of a product
information request category, a complaint category, and a store
location category. Next, the user assigns a relevant category to
each item (e-mail) in the corpus. Assignment of the categories may
be performed via a UI presented by corpus editing processes 202,
or alternatively the categories may be added to the file containing
the sample texts prior to importing them into the corpus file. Other
methods and techniques, both manual and semi-automated, may be utilized
to define a set of categories and assign a relevant category to
individual corpus items. These methods and techniques include locating
specified text strings, classifying by response (e.g., for sample
texts consisting of standard ("canned") answers appended
to customer email inquiries), and clustering (identifying semantic
similarities in unclassified corpus items to group textually similar
items together).
[0034] FIG. 4 is an example of a UI 400 presented by corpus editing
processes 202, allowing a user to view and edit individual corpus
items. Each row 402 in the UI represents an individual corpus item,
and each column 404 represents a corpus field, or name-value pair.
In the example depicted in FIG. 4, the corpus items are articles
posted to Usenet groups, and the corpus fields include a "From"
field identifying the source email address, a "Message"
field containing the text of the article, and a "Subject"
field. The corpus fields further include a "Categories"
field identifying the category which has been assigned by the user
to each corpus item (in the example depicted, the Usenet group to
which the article has been posted), using a manual or semi-automated
technique. The user may select one or more corpus items from the
list displayed in the UI to view details of the items or to edit
the values of the corresponding corpus fields.
[0035] Referring again to the workflow diagram of FIG. 3, after
the corpus file has been created and edited, KB 108 is built and
tested from analysis of the corpus items. Building of KB 108 is
managed by KB building processes 204. KB building processes initially
split the corpus into a first subset to be used for training KB
108, and a second subset to be used for testing KB 108. The process
of splitting the corpus into training and testing subsets is symbolically
depicted in FIG. 5. Of course, many schemes may be utilized for
dividing the corpus into subsets. Preferably, the manner in which
the corpus is split is selectable by the user. FIG. 6 is an exemplary
UI screen 600 listing various user-selectable options 602 for splitting
the corpus into subsets for training and testing (e.g., using random
cuts, create (train) using even-numbered items/analyze (test) using
odd-numbered items (a method known in the art as "jack-knife")
and so on). It should be recognized that the training and testing
subsets may be overlapping (i.e., include common corpus items),
and that one or both of the subsets may include the entire corpus
(e.g., as used for the "Create using all selected, analyze
using all selected" option.)
[0036] After the corpus has been split into training and testing
subsets, KB building processes 204 initiate the creation of KB 108.
Generally described, the process of building KB 108 involves deriving
relevant semantic and statistical information from the corpus items
in the training subset and associating this information with corresponding
nodes of the KB 108. As noted above, some or all of the nodes represent
categories of the predefined set of categories; for the automated
e-mail response application example described above, KB 108 may
consist of three nodes arranged in a flat structure: a first node
corresponding to the product information request category, a second
node corresponding to the complaint category, and a third node corresponding
to the store location category. According to the implementation
depicted in FIG. 1, KB building processes 204 may invoke the services
of modeling engine 106 to perform natural language and semantic
analysis of the corpus texts and thereby derive the semantic and
statistical information to be associated with the nodes of KB 108.
Those skilled in the art will recognize that various well-known
techniques and algorithms may be employed for processing of the
corpus texts and extraction of the relevant semantic and statistical
information, and so such techniques and algorithms need not be discussed
herein. It should also be recognized that KB 108 will not necessarily
be empty (i.e., lacking structure and relevant semantic/statistical
information) prior to initiation of the KB building process; in
some cases and implementations, KB building processes 204 will operate
on an existing KB which has previously been provided with a structure
and relevant information. In such cases and implementations, KB
building processes 204 will cause the structure and information
to be modified in accordance with the results of analysis of the
texts in the training subset.
[0037] After KB 108 has been built, its performance is tested by
classifying the corpus items in the testing subset of the corpus
using the information contained in KB 108 to determine if the corpus
items have been classified into the most relevant category(ies).
Testing of KB 108 is managed by KB testing processes 206. In the
FIG. 1 embodiment, KB testing processes 108 may call upon the services
of modeling engine 106 to extract concepts from the corpus items
(using, for example, an NLP engine) and perform statistical pattern
matching using the relevant semantic and statistical information
for each category contained within KB 108. This process will return
a set of match scores for each corpus item in the testing subset.
Each match score in the match score set represents a confidence
level that the corpus item belongs to the associated category. In
a typical implementation, match scores determined by modeling engine
106 fall within a pre-established range (e.g., 0-100), with higher
scores denoting a high level of confidence that the corpus item
belongs to the associated category, and lower scores denoting a
low level of confidence that the corpus item belongs to the associated
category. For example, using the three-category KB example discussed
above (consisting of a product information category, a complaint
category, and a store location category), a corpus item in the testing
subset could have a match score of 95 for the product information
category, a match score of 30 for the complaint category, and a
match score of 5 for the store location category. If the corpus
item in question is properly classified in the product information
category, then KB 108 would be regarded as performing well; if,
in fact, the corpus item is properly classified in one of the other
two categories, then KB 108 would be regarded as performing poorly.
Test results, comprising match score sets obtained for each corpus
item in the training subset are stored by KB testing processes 206
and used for generating reports assessing various aspects of KB
performance, as described below.
[0038] Referring again to the workflow diagram shown in FIG. 3,
the user may select and view reports generated by KB tool 100 to
gauge the performance of KB 108 and make appropriate adjustments
to improve performance. Report generation is managed by reporting
processes 208. As used herein, the term "report" denotes
any collection of graphical and/or textual information that visually
represents the performance of KB 108. Reports generated by reporting
processes 208 include both summary reports, which depict the performance
of KB 108 across all categories, and category reports, which depict
the performance of KB 108 for a specified category. In a typical
implementation, the reporting processes 208 will cause a UI or series
of UI screens to be displayed in which the user can select the type
and content of report he wishes to view. Examples of reports generated
by KB tool 100 are described below. It is noted, however, that the
reports described and depicted herein are intended as illustrative
examples, and that the scope of the present invention should not
be construed as being limited to these examples. It is further noted
that the reports may be presented in a window of a graphical display
and/or in a printed document.
[0039] FIG. 7 is an exemplary category report in the form of a
scoring graph report 700. Scoring graph report 700 depicts match
scores for each corpus item in a selected category. Each point 702
on the graph represents an individual corpus item. Light points
704 represent corpus items that belong to the selected category,
and dark points 706 represent corpus items that do not belong to
the selected category. If KB 108 is performing well in the selected
category, most of the light points 704 will appear in the upper
portion of the graph (at or above a match score of 0.80), and most
of the dark points 706 will appear in the lower portion of the graph.
In a preferred implementation of the scoring graph report, a user
can select an individual point 702 on the graph (e.g., by clicking
on the point) to view details of the corresponding corpus item.
This feature allows the user to quickly and easily inspect "stray
points" which are displaced from their expected, optimal area
of the graph, i.e., light points 704 appearing in the lower portion
of the graph and dark points 706 appearing in the upper portion
of the graph, and determine if any discernible error or condition
exists which caused the misclassification or failure to classify
into the expected category. For example, the user may click on one
of the stray dark points and discern that the associated corpus
item was assigned the wrong category during the corpus creation
process. The user may then edit the corpus item to assign the correct
category and re-train KB 108 using the corrected information.
[0040] FIG. 8 is a summary report 800 consisting of a graph of
total precision versus recall for all categories in KB 108. As used
herein, the term "precision" denotes the fraction of corpus
items identified as relevant to a category that are actually relevant
to the category, and the term "recall" denotes the fraction
of corpus items actually relevant to a category that are identified
as being relevant. The graph of total precision versus recall represents
a weighted average of the precision for each recall value, wherein
categories having a relatively greater number of texts are accorded
greater weight than categories having a relatively smaller number
of texts. The total precision versus recall graph provides a visual
indication of the overall performance of KB 108. Generally, a curve
located primarily in the upper-right portion of the graph indicates
that KB 108 is performing well, whereas a curve located primarily
in the lower-left portion of the graph indicates a poorly performing
KB 108. If the results indicate that the performance of KB 108 is
poor, then the user may select and view category reports depicting
precision verses recall results for each category in order to identify
whether any specific category is particularly problematic.
[0041] FIG. 9 shows an exemplary cumulative success over time report
900. This report consists of a graph depicting the cumulative success
of KB 108 during the lifetime of a chronological testing corpus
(i.e., a corpus whose items are in the order they were received
by the system). Each line 902 on the graph show how often the correct
category was among each of the top five category choices (those
categories having the highest match scores). More specifically,
the bottommost line represents, for each point in time, how often
the correct category was the highest scoring category, the next
(vertically adjacent) line shows how often the correct category
was one of the two highest scoring categories, and so on. Cumulative
success over time report 900 is useful to assess trends in KB 108
performance, and identify problems occurring during particular time
frames (as evidenced by dips in the lines indicative of decreased
KB 108 performance). Generation of the cumulative success over time
report requires inserting a corpus field for each corpus item that
contains the date and time the corpus item was received.
[0042] FIG. 10 shows an exemplary threshold calculator report 1000.
Thresholds are values used by application 104 to determine whether
to take a specified action with respect to a communication. For
example, where application 104 is in the form of an automated e-mail
response application, a threshold setting may be used by application
104 to determine whether to auto-respond to an incoming e-mail,
i.e., application 104 will auto-respond to a customer email only
if the match score for a category exceeds a value (e.g., 90) indicative
of a high confidence that the email should be classified in the
category. Prior art CRM systems have generally lacked tools enabling
the user to intelligently set thresholds in order to achieve a desired
performance objective. Threshold calculator report 1000 provides
a means for depicting the relationship between the threshold value
and various performance parameters, including cost ratio (defined
below), precision, and recall.
[0043] Threshold calculator report 1000 includes a graph 1002 showing
match values for each corpus item for a specified category. Again,
light points 1004 represent corpus items which belong to the specified
category, and dark points 1006 represent corpus items which do not
belong to the specified category. The current value of the threshold
is represented as line 1008. Threshold calculator report 1000 also
lists values of cost ratio, precision, recall, false positives,
and false negatives corresponding to the current threshold value.
The user may set values for any one of the following parameters:
threshold, cost ratio, precision, or recall. In alternative implementations,
user-settable values may include other suitable parameters which
would be apparent to those skilled in the art. One such user-settable
value is an automation ratio, which denotes the percentage of corpus
items which meet or exceed the threshold. Responsive to entry of
any of these values, reporting processes 208 calculates and displays
corresponding values of the other parameters. For example, if the
user enters a threshold value, reporting processes 208 calculate
and display the resultant values of precision and recall. In another
example, the user enters a desired value of precision, and reporting
processes 208 calculate and display the corresponding threshold
value. The user may also specify a cost ratio, which is the amount
saved by automatically responding to a communication correctly divided
by the amount lost by automatically responding to a communication
incorrectly (for example, a saving of $10 for each correct automated
response and a loss of $100 for each incorrect automated response
will yield a cost ratio of 0.1), and reporting processes 208 will
responsively calculate and display the corresponding threshold value.
The methods of calculating the values of the foregoing parameters
based on other specified parameters should be easily discernible
to one of ordinary skill in the art and need not be described herein.
The threshold calculator report 1000 may also include a button 1010
allowing the user to write the current (most recently specified
or calculated) threshold value to the corresponding node of KB 108.
[0044] Finally, FIG. 11 shows a "stealing/stolen" report
1100 generated for a specified category. In some cases, poor KB
performance occurs when categories "steal" corpus items
from each other (i.e., when a corpus item receives a higher match
score for an inappropriate category, relative to the match score
calculated for the category to which the item belongs). For a selected
category, stealing/stolen report 1100 shows the percentage and number
of corpus items initially assigned to the selected category which
yielded higher match scores in other categories (the "stolen
from" column). In addition, stealing/stolen report 1100 displays
for, each of the other categories, the percentage of corpus items
initially assigned to the category which yielded a higher match
score in the selected category (the "stolen by" column).
[0045] The occurrence of a relatively high number of incidents
of stealing between pairs of categories may indicate that modeling
engine 106 does not perceive a clear difference between the intents
of the two categories, i.e., that the two nodes of KB 108 representing
the categories contain overlapping content. In such situations,
KB 108 performance may be improved by carefully redefining the categories
to more clearly distinguish intents (or, if appropriate, joining
them into a single category), reassigning categories to the corpus
items to reflect the redefined categories, and retraining KB 108
using KB building processes 204.
[0046] Referring again to the FIG. 3 workflow diagram, the user
may utilize information contained in one or more of the reports
generated by reporting processes 208 to improve KB performance.
Actions which may be taken by the user to remedy problems identified
in the reports include redefining, deleting or adding categories;
correcting or otherwise modifying individual corpus items; and,
modifying KB 108 structure (e.g., by changing the organization of
nodes, or by adding or changing rule-based nodes). Once these actions
have been taken, KB 108 may be retrained by invoking KB building
processes 204, and the retrained KB 108 may be tested against the
testing subset of corpus items using KB testing processes 206. The
user may then evaluate the performance of the retrained KB 108 by
generating the appropriate reports using reporting processes 208.
[0047] It will be recognized by those skilled in the art that,
while the invention has been described above in terms of preferred
embodiments, it is not limited thereto. Various features and aspects
of the above invention may be used individually or jointly. Further,
although the invention has been described in the context of its
implementation in a particular environment and for particular applications,
those skilled in the art will recognize that its usefulness is not
limited thereto and that the present invention can be beneficially
utilized in any number of environments and implementations.
|