A Cipher technology cluster refers to a grouping of patent families relating to the same technical area.

A. How does Cipher group patent families into technology clusters?
Cipher groups patent families with similar characteristics into Technology
Clusters. This is done by creating similarity matrices based on patent meta data. Cipher’s clustering involves no human intervention or hard coded categories.

Cipher uses meta-data available to create technology clusters that are as accurate as possible. Machine learning plays a role here as the algorithms identify the technology domain of the patents and give different weights to different factors (e.g. codes tend to be poor at clustering software). Meta data used to create clusters includes:

  • CPC codes
  • Citations (forward and backward)
  • Title
  • Abstract

 B. How are cluster names determined?
Cipher cluster names are machine generated, by reference to the title and abstract. Clusters are given a name that most closely describes all patent families in the Cluster using text summarisation, and natural language processing (NLP) techniques.

The clustering and naming algorithms are separate, ensuring that there is no possibility of a self-fulfilling prophecy in the clustering results. If the clustering were based on the occurrence of a certain phrase then it would bias that cluster towards containing only patents that used the phrase, and not other closely related technologies irrespective of words, skewing the clustering results.

Miscellaneous cluster
Cipher will present a maximum of ten technology clusters. However, if a group of patent families creates more than ten clusters, Cipher will show the top nine and group the remaining clusters into ‘miscellaneous’.

Unrelated cluster
An ‘unrelated’ cluster will appear when all of the portfolios are not clustered. 

The 'unrelated' cluster will show the number of patent families that do not fit into any of the technology clusters in the report.

e.g. If Cipher clustered company X and you are benchmarking company Y and Z: unrelated shows the number of patent families in Y and Z's portfolio that don’t cross over, or fit into, any of X’s clusters.

