|
1 | | -# PatentsView API mining |
| 1 | +# PatentsView API wrapper |
2 | 2 |
|
3 | | -This project calls the PatentsView API. |
| 3 | +This project is a wrapper for the PatentsView API. |
4 | 4 |
|
5 | | -* [PatentsView Glossary](http://www.patentsview.org/api/glossary.html) provides a description of the variables. |
| 5 | +* [PatentsView Glossary](http://www.patentsview.org/api/glossary.html) provides a description of the variables. |
| 6 | + |
| 7 | +## Important Notes: |
| 8 | + |
| 9 | +* Use <https://dev.patentsview.org> and not <https://www.patentsview.org/>; the former is laxer in terms of accepting input |
| 10 | +* Naming is finicky (more so on the second one), even spacing and other character affect the search results (see below). |
| 11 | + |
| 12 | +## Remarks about the data |
| 13 | + |
| 14 | +* Patent Numbers are alphanumeric (they can include letters) |
| 15 | +* PatentsView only includes information about the patent at issue. It does not include changes to patent information after the patent has been issued. |
| 16 | + * This means that if the company changes name, it won't be reflected in the patent. Example: if "International Business Machines" renames itself to "IBM", patents issued to "International Business Machines" will still be issued to "International Business Machines" (and not "IBM"). |
| 17 | + * As an example: `NETFLIX, INC.` has an `assignee_key_id` of `17594` and an `assignee_id` of `org_2lAuxOpAtNMvtTxhuLmX`; `NETFLIX.COM, INC.` on the other hand an `assignee_key_id` of `org_UNHkzir8tY7NlQrOJKT4` and an `assignee_id` of `363028`. (This of course assumes `NETFLIX, INC.` and `NETFLIX.COM, INC.` are the same company, which is highly probable). |
| 18 | + * The same applies for acquisitions. Example: Company A has patent *X*; once company B acquires company A, patent *X* would still show that it is assigned to company *A*. |
| 19 | + * Probably the same thing holds if a company acquires certain patents of another company. |
| 20 | +* The patents can be assigned to organizations (as opposed to individuals). This is indicated by the 'assignee organization' field returned by the API. |
| 21 | +* The assignee organizations (i.e. companies) are distinguished by name. Each organization name is a 'separate' company. |
| 22 | + * This means that a patent can be assigned to "IBM", "IBM Inc.", "International Business Machines". |
| 23 | + * Different organization names have different `assignee_id`s and `assignee_key_id`s (see `NETFLIX` example above). |
| 24 | +* **Different endpoints behave differently**: particularly <https://www.patentsview.org/> and <https://dev.patentsview.org> |
| 25 | + * **Naming is finicky on the first one**: If you search for `Abbott Laboratories` or for `ABBOTT LABORATORIES`, |
| 26 | + you will get the same results. If you search for `ABBOTT Laboratories`, `Abbott LABORATORIES`, |
| 27 | + or `abbott laboratories`, you will get nothing. |
| 28 | + * The second one seems to work better, but you still have to replace the carriage return and line break characters. |
| 29 | + |
| 30 | +## Adding companies |
| 31 | + |
| 32 | +Create an Microsoft Excel spreadsheet (`.xlsx` file) with the following structure: |
| 33 | + |
| 34 | +<table> |
| 35 | + <thead> |
| 36 | + <tr> |
| 37 | + <th>Firm ID</th> |
| 38 | + <th>Firm Name</th> |
| 39 | + <th>Alternative names </th> |
| 40 | + <th></th> |
| 41 | + <th></th> |
| 42 | + <th></th> |
| 43 | + <th></th> |
| 44 | + </tr> |
| 45 | + <tr> |
| 46 | + <th>ID</th> |
| 47 | + <th>Name 1</th> |
| 48 | + <th>Name 2</th> |
| 49 | + <th>Name 3</th> |
| 50 | + <th>Name 4</th> |
| 51 | + <th>...</th> |
| 52 | + <th>Name X</th> |
| 53 | + </tr> |
| 54 | + </thead> |
| 55 | + <tbody> |
| 56 | + <tr> |
| 57 | + <td>ID2</td> |
| 58 | + <td>Company 2 Primary Name / Name 1</td> |
| 59 | + <td>Name 2</td> |
| 60 | + <td>Name 3</td> |
| 61 | + <td>Name 4</td> |
| 62 | + <td>...</td> |
| 63 | + <td>Name X</td> |
| 64 | + </tr> |
| 65 | + <tr> |
| 66 | + <td>ID1</td> |
| 67 | + <td>Company 1 Primary Name / Name 1</td> |
| 68 | + <td>Name 2</td> |
| 69 | + <td>Name 3</td> |
| 70 | + <td>Name 4</td> |
| 71 | + <td>...</td> |
| 72 | + <td>Name X</td> |
| 73 | + </tr> |
| 74 | + </tbody> |
| 75 | +</table> |
| 76 | + |
| 77 | + |
| 78 | +## Database Structure |
| 79 | + |
| 80 | +Here is an Entity Relationship Diagram (ERD) of the database structure. |
| 81 | + |
| 82 | + |
| 83 | + |
| 84 | +## Using SQL to Select Patents |
| 85 | + |
| 86 | +The `sql` folder has some SQL scripts that might come in handy. |
| 87 | +As an example, here is SQL query that selects patents between two dates: |
| 88 | + |
| 89 | +``` |
| 90 | +SELECT |
| 91 | + p.patent_number as "Patent Number", |
| 92 | + p.patent_title as "Patent Title", |
| 93 | + -- p.company_id as "Company ID", |
| 94 | + c.name as "Company Name", |
| 95 | + -- p.company_alternate_name_id as "Alternate Name ID", |
| 96 | + an.name as "Company Name Listed on Patent", |
| 97 | + p.year, |
| 98 | + p.grant_date as "Grant Date", |
| 99 | + p.uspc_class as "USPC Classes" |
| 100 | +FROM |
| 101 | + patents as p |
| 102 | +JOIN |
| 103 | + companies as c |
| 104 | +ON |
| 105 | + p.company_id = c.id |
| 106 | +LEFT JOIN |
| 107 | + alternate_company_names as an |
| 108 | +ON |
| 109 | + p.company_alternate_name_id = an.id |
| 110 | +WHERE |
| 111 | + p.grant_date > DATE("2006-01-03") AND |
| 112 | + p.grant_date < DATE("2010-06-13"); |
| 113 | +``` |
| 114 | + |
| 115 | +## Software Applications |
| 116 | + |
| 117 | +* [DbVisualizer](https://www.dbvis.com/) was used to generate the graphs |
| 118 | +* [DB Browser for SQLite](https://sqlitebrowser.org/) was used to look at the data and execute SQL queries |
0 commit comments