A python library for scraping case information from the Oklahoma State Courts Network.
- 12/16/2024: Updated to use Selectolax instead of BeautifulSoup. Most parsing functions now use Selectolax which is faster and more memory efficient. Some BeautifulSoup functions are still used and old parsers are still in the codebase for testing.
oscn > Python package source to provide an api for retrieving and parsing case records.
scripts > Python scripts showing use of the oscn package
- example.py: demonstrates use of the request Case and Caselist
- retrieve-counts.py: saves a list of all counts for a list of counties and years
- find-counts.py: saves a list of counts passing a test for a list of counties and years
- soup_test.py: a stub for testing parsing attempts using BeautifulSoup
- parse_test.py: a stub for developing using saved examples
- counties: Returns a list of counties.
- courts: Same as counties but more a accurate description.
- judges: Returns a list of objects formated as {'name': 'Bond, James', 'number': '007'}
- types: returns a dict of case type codes and descriptons
- type: function to return case type description. Usage:
>>> oscn.type("AO") 'CIVIL ADMINISTRATIVE'
-
Case: Returns a single case. Case can be saved as files using Case.save() and retrieved using Case.open().
-
CaseList: Returns an iterator for retrieving cases for a county and year. CaseLists can be filtered using .find(). See scripts/example.py for details
-
Party: Returns information on parties available on OSCN.
-
Docket: Returns docket of cases for specific judges and date
Parsers accept the html of an OSCN page and return python objects.
- filed: returns a string of the filing date (e.g. 12/25/2017)
- closed: returns a string of the date the case was closed. Return None if not closed.
- counts: returns of list of count dicts found in a case. Keys include 'description' of the count. If available 'violation' and 'disposed' are added.
- judge: returns a string of the judge's name
- parties: returns a list of dicts with these keys: id, name, type
- docket: returns a list of rows in a docket
- events: returns a list of dicts with these keys: event, party, docket, reporter, date, description. The keys date and description are cleaner versions of the event text. The event key will be deprecated some day so use date and description if you are starting a project.
- attorneys: returns a list of dicts with these keys: name, address, and representing
- issues: returns a list of dicts with issue information. Each issues includes a list of dicts for each party
- name: returns 'Requested Party'
- alias: returns 'Alias or Alternate Names'
- profile: returns dict of values in 'Personal Profile'
- birth_month: returns string of 'Birth Month and Year'
- addresses: returns a list of dicts for each address
- cases: returns a list of case indexes
- tables: returns the html table for each case in the docket
- CaseIndexes: returns an iterator of case indexes (e.g. tulsa-CF-2019-12).
Create a CaseIndexes list using these key word arguments:
- county: defaults to all,
- last_name: use this for company or organization names
- first_name: optional
- middle_name: optional
- filed_after: More readable than FiledDateL
- filed_before: More readable than FiledDateH
- closed_after: More readable than ClosedDateL
- closed_before: More readable than ClosedDateH
- The % wild card is added to all words in name, first and middle
- Date arguments use MM/DD/YYY strings.
If you are familar with the OSCN search parameters you can initialize CaseIndexes using these as key word arguments: db, number, lname, fname, mname, DoBMin, DoBMax, partytype, apct, dcct, FiledDate, FiledDateH, ClosedDateL, ClosedDateH, iLC, iLCType, iYear, iNumber, and citation
Using this will override init keyword values such as first or filed_after.
- python3 -m venv ~/your_path/oscn
- source ~/your_path/oscn/bin/activate
git clone [email protected]:codefortulsa/oscn.git
cd oscn
pip install -e .
Install with pip install oscn
Script example:
import oscn
Request a single case:
oscn.request.Case(county='tulsa', year='2018', number=84)
or use case index notation:
oscn.request.Case('love-CF-2019-25')
To request a list of cases to iterate:
oscn.request.CaseList(county='adair', year='2016')
pytest tests/
or with ipdb:
- `pytest -s tests/`
specify a test:
pytest -s tests/test_parse.py -k 'test_events'
- Edit setup.py
python3 setup.py sdist bdist_wheel
twine upload dist/*
In some cases a custom user agent is required in the header of requests. Setting an environmental varialbe called OSCN_USER_AGENT will override the default.