Skip to content

Commit 3ea0944

Browse files
committed
Merge branch 'dev'
2 parents 25f5e95 + 1528557 commit 3ea0944

File tree

8 files changed

+525
-94
lines changed

8 files changed

+525
-94
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
## SQLite3 DB ignore
22
/**/*.db
3+
/**/*.db-journal
34

45
## .xlsx ignore
56
/**/*.xlsx
@@ -98,7 +99,6 @@ ENV/
9899
### VirtualEnv template
99100
# Virtualenv
100101
# http://iamzed.com/2009/05/07/a-primer-on-virtualenv/
101-
.Python
102102
[Bb]in
103103
[Ii]nclude
104104
[Ll]ib

README.md

Lines changed: 116 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,118 @@
1-
# PatentsView API mining
1+
# PatentsView API wrapper
22

3-
This project calls the PatentsView API.
3+
This project is a wrapper for the PatentsView API.
44

5-
* [PatentsView Glossary](http://www.patentsview.org/api/glossary.html) provides a description of the variables.
5+
* [PatentsView Glossary](http://www.patentsview.org/api/glossary.html) provides a description of the variables.
6+
7+
## Important Notes:
8+
9+
* Use <https://dev.patentsview.org> and not <https://www.patentsview.org/>; the former is laxer in terms of accepting input
10+
* Naming is finicky (more so on the second one), even spacing and other character affect the search results (see below).
11+
12+
## Remarks about the data
13+
14+
* Patent Numbers are alphanumeric (they can include letters)
15+
* PatentsView only includes information about the patent at issue. It does not include changes to patent information after the patent has been issued.
16+
* This means that if the company changes name, it won't be reflected in the patent. Example: if "International Business Machines" renames itself to "IBM", patents issued to "International Business Machines" will still be issued to "International Business Machines" (and not "IBM").
17+
* As an example: `NETFLIX, INC.` has an `assignee_key_id` of `17594` and an `assignee_id` of `org_2lAuxOpAtNMvtTxhuLmX`; `NETFLIX.COM, INC.` on the other hand an `assignee_key_id` of `org_UNHkzir8tY7NlQrOJKT4` and an `assignee_id` of `363028`. (This of course assumes `NETFLIX, INC.` and `NETFLIX.COM, INC.` are the same company, which is highly probable).
18+
* The same applies for acquisitions. Example: Company A has patent *X*; once company B acquires company A, patent *X* would still show that it is assigned to company *A*.
19+
* Probably the same thing holds if a company acquires certain patents of another company.
20+
* The patents can be assigned to organizations (as opposed to individuals). This is indicated by the 'assignee organization' field returned by the API.
21+
* The assignee organizations (i.e. companies) are distinguished by name. Each organization name is a 'separate' company.
22+
* This means that a patent can be assigned to "IBM", "IBM Inc.", "International Business Machines".
23+
* Different organization names have different `assignee_id`s and `assignee_key_id`s (see `NETFLIX` example above).
24+
* **Different endpoints behave differently**: particularly <https://www.patentsview.org/> and <https://dev.patentsview.org>
25+
* **Naming is finicky on the first one**: If you search for `Abbott Laboratories` or for `ABBOTT LABORATORIES`,
26+
you will get the same results. If you search for `ABBOTT Laboratories`, `Abbott LABORATORIES`,
27+
or `abbott laboratories`, you will get nothing.
28+
* The second one seems to work better, but you still have to replace the carriage return and line break characters.
29+
30+
## Adding companies
31+
32+
Create an Microsoft Excel spreadsheet (`.xlsx` file) with the following structure:
33+
34+
<table>
35+
<thead>
36+
<tr>
37+
<th>Firm ID</th>
38+
<th>Firm Name</th>
39+
<th>Alternative names </th>
40+
<th></th>
41+
<th></th>
42+
<th></th>
43+
<th></th>
44+
</tr>
45+
<tr>
46+
<th>ID</th>
47+
<th>Name 1</th>
48+
<th>Name 2</th>
49+
<th>Name 3</th>
50+
<th>Name 4</th>
51+
<th>...</th>
52+
<th>Name X</th>
53+
</tr>
54+
</thead>
55+
<tbody>
56+
<tr>
57+
<td>ID2</td>
58+
<td>Company 2 Primary Name / Name 1</td>
59+
<td>Name 2</td>
60+
<td>Name 3</td>
61+
<td>Name 4</td>
62+
<td>...</td>
63+
<td>Name X</td>
64+
</tr>
65+
<tr>
66+
<td>ID1</td>
67+
<td>Company 1 Primary Name / Name 1</td>
68+
<td>Name 2</td>
69+
<td>Name 3</td>
70+
<td>Name 4</td>
71+
<td>...</td>
72+
<td>Name X</td>
73+
</tr>
74+
</tbody>
75+
</table>
76+
77+
78+
## Database Structure
79+
80+
Here is an Entity Relationship Diagram (ERD) of the database structure.
81+
82+
![Entity Relationship Diagram (ERD) of the database structure](images/patents_view_table.png)
83+
84+
## Using SQL to Select Patents
85+
86+
The `sql` folder has some SQL scripts that might come in handy.
87+
As an example, here is SQL query that selects patents between two dates:
88+
89+
```
90+
SELECT
91+
p.patent_number as "Patent Number",
92+
p.patent_title as "Patent Title",
93+
-- p.company_id as "Company ID",
94+
c.name as "Company Name",
95+
-- p.company_alternate_name_id as "Alternate Name ID",
96+
an.name as "Company Name Listed on Patent",
97+
p.year,
98+
p.grant_date as "Grant Date",
99+
p.uspc_class as "USPC Classes"
100+
FROM
101+
patents as p
102+
JOIN
103+
companies as c
104+
ON
105+
p.company_id = c.id
106+
LEFT JOIN
107+
alternate_company_names as an
108+
ON
109+
p.company_alternate_name_id = an.id
110+
WHERE
111+
p.grant_date > DATE("2006-01-03") AND
112+
p.grant_date < DATE("2010-06-13");
113+
```
114+
115+
## Software Applications
116+
117+
* [DbVisualizer](https://www.dbvis.com/) was used to generate the graphs
118+
* [DB Browser for SQLite](https://sqlitebrowser.org/) was used to look at the data and execute SQL queries

images/patents_view_table.png

22.4 KB
Loading
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
/*
2+
* Use this script to find which cited patents need to be added to the Patents table.
3+
*/
4+
SELECT
5+
cited_patent_number
6+
FROM
7+
cited_patents
8+
WHERE
9+
cited_patent_number NOT IN (SELECT DISTINCT patent_number FROM patents);

sql/select_cited_patents.sql

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
SELECT
2+
DISTINCT
3+
p.patent_number as "Citing Patent Number"
4+
,co.name as "Citing Company"
5+
,cp.cited_patent_number as "Cited Patent Number"
6+
,pp.patent_title as "Cited Patent Title"
7+
,pp.year as "Year"
8+
,pp.grant_date as "Grant Date"
9+
,pp.uspc_class as "USPC Class"
10+
FROM
11+
patents as p
12+
LEFT JOIN
13+
companies as co
14+
ON
15+
co.id = p.company_id
16+
JOIN
17+
cited_patents as cp
18+
ON
19+
p.patent_number = cp.citing_patent_number
20+
LEFT JOIN
21+
patents as pp
22+
ON
23+
cp.cited_patent_number = pp.patent_number
24+
-- Uncomment the following 2 lines if you to filter by patent_number (or something else of your choosing)
25+
--WHERE
26+
-- p.patent_number = "10001497"
27+
ORDER BY
28+
p.patent_number ASC
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
/*
2+
* Use this SQL query to select patents between two dates.
3+
* Uncomment the lines to retrieve the company ID and the alternate name ID.
4+
*/
5+
SELECT
6+
p.patent_number as "Patent Number",
7+
p.patent_title as "Patent Title",
8+
-- p.company_id as "Company ID",
9+
c.name as "Company Name",
10+
-- p.company_alternate_name_id as "Alternate Name ID",
11+
an.name as "Company Name Listed on Patent",
12+
p.year,
13+
p.grant_date as "Grant Date",
14+
p.uspc_class as "USPC Classes"
15+
FROM
16+
patents as p
17+
JOIN
18+
companies as c
19+
ON
20+
p.company_id = c.id
21+
LEFT JOIN
22+
alternate_company_names as an
23+
ON
24+
p.company_alternate_name_id = an.id
25+
WHERE
26+
p.grant_date > DATE("2006-01-03") AND
27+
p.grant_date < DATE("2010-06-13");

0 commit comments

Comments
 (0)