Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
103 commits
Select commit Hold shift + click to select a range
5c255b9
started specs for BCL
Oct 25, 2012
193a87b
hide these files
Oct 25, 2012
a8292a2
adding specs for BCL
Oct 25, 2012
c33d59e
new files for parsing BC Liquor site. pulling crawlkit out of lcbo b/…
Oct 25, 2012
b8d5611
added BC cities, stores, products list
Nov 14, 2012
68cdf1e
update specs and added region, ski, ups, volume, kosher, organic to B…
Dec 4, 2012
131bf1a
bump version
Dec 4, 2012
c97036a
git add type
Dec 18, 2012
0c8befc
scrape image url for bcd and bump version
Jan 13, 2013
9609480
support L in volume
Jan 25, 2013
b4bb246
updating scrapper to work with new site
May 11, 2013
c9b0ff9
version bump
May 11, 2013
14e5fd6
using json to grab the inventory is deprecated… now uses multiple req…
May 21, 2013
9b03f46
adding support for SAQ, product pages and inventory pages, still inco…
May 22, 2013
59ff674
Version bump and more support for SAQ
Jun 12, 2013
0d92b04
Version bump, support UTF-8
Jun 18, 2013
f677d2f
fix quotes
Jun 18, 2013
008e378
version bump
Jun 18, 2013
a792c84
version bump, update saq inventory uri
Jun 24, 2013
756f3da
fix bcd product_page_lists, consolidated version info
Jun 25, 2013
f18dcd6
version bump
Jun 25, 2013
57689cd
multiple regex for saq sku
Jun 26, 2013
43eeea6
rake tasks to update products and inventories
Jun 27, 2013
8859138
update typhoeus gem, fix broken specs, removed script to update winea…
Jul 3, 2013
20b9ee4
Version bump
Jul 3, 2013
ce3d4f2
rescue missing upc for bcl product, VERSION bump
Jul 4, 2013
4b9fe15
limit to 3 tries, change user agent, VERSION bump
Jul 5, 2013
6d52a53
update bcl inventory to scrape mobile site for stock, it has all stor…
Jul 5, 2013
0988709
VERSION bump
Jul 5, 2013
84ac981
saq::product emits varietal
Jul 10, 2013
ae5b023
FIX image grabbing for saq
Oct 2, 2013
5e1b519
use larger image for saq
Oct 3, 2013
dd89509
Version bump
Oct 3, 2013
e16d7a1
fix saq product totals
Oct 3, 2013
4454944
fix broken saq non-image
Oct 3, 2013
8f16f27
update SAQ stores and cities
Oct 29, 2013
1e189ea
fix broken image saq image scrape, saq inventory update
Nov 11, 2013
7136bad
update parsing for SAQ to handle sale prices, bump version
Mar 18, 2014
876b5a7
update gemspec dependancy for nokogiri
Mar 18, 2014
fb2a3a4
added support for SAQ LTO
Mar 25, 2014
71ff53b
update LTO for BCL and inventory scraper
Apr 7, 2014
94eca16
support thousands
May 6, 2014
d717e4b
update lcbo lib to scrape new site format
Jun 26, 2014
2e07064
lcbo inventory page scraping
Jun 27, 2014
1bd1386
support lot better
Jun 27, 2014
2cf292f
better parsing of form data
Jun 27, 2014
cdf68ff
fix bcl store scraper
Jul 9, 2014
2f27293
update bcl store_ids
Jul 13, 2014
921a91b
LCBO bottle scraper doesn’t give blank bottles
Aug 13, 2014
d3bd5fb
change BCLDB product page
Nov 27, 2014
bc3e820
update bcl inventory scraper
Jan 8, 2015
a5a2699
fix BCL store page
Feb 9, 2015
3d87ef2
change saq api_id parser
Mar 2, 2015
9ec0ea0
PROD FIX BCL inventory page has changed again !
Apr 21, 2015
ce92b1c
PROD FIX saq upcs regex
Jun 16, 2015
5531214
rescue missing description error for LCBO
lenard Jan 6, 2016
f29981d
do not throw error when lcbo inventory page returns 0 results
lenard Jan 6, 2016
b47a9f3
update LTO for lcbo
lenard Jan 13, 2016
ef4e91c
allow gift bottle size, it shouldn’t throw an error
lenard Jan 23, 2016
c62dee3
fix case where saw inventory is missing inventory element (rescues as 0)
lenard Jan 28, 2016
4240bf5
support lcbo prices > 1000
lenard Feb 10, 2016
a0f604b
support prices > $1000 in bc
lenard Feb 10, 2016
825fe36
add upc for lcbo
lenard Apr 26, 2016
8a07ab0
update lcbo gem to parse store pages
lenard Apr 26, 2016
ed798ae
update url used for parsing lcbo data
lenard Jul 24, 2016
ffe41be
html changes on lcbo.com site
lenard Aug 5, 2016
024feac
update for LTO on LCBO
lenard Aug 16, 2016
5f5b447
add 301 to valid response code for bottle image in lcbo
lenard Aug 23, 2016
8723c78
add support for generic bottle images for LCBO.com
lenard Aug 29, 2016
32f50cc
fix lcbo image scraper
lenard Sep 12, 2016
e3d314e
LCBO updated site html
lenard Oct 12, 2016
ce9b25e
Move to https
lenard Jun 23, 2017
47234c2
False positive BAMS
lenard Jul 18, 2017
7367ef8
fix for missing BAM info
lenard Jul 20, 2017
1d762b0
Bump for https
lenard Sep 28, 2017
b2adcce
Update saq product url
lenard Jan 6, 2018
8255474
LCBO updated their website :(
lenard Mar 2, 2019
c8ca5ca
Update scraping LTO
lenard Aug 26, 2019
db383a8
Update inventory scraping to use lcbo.com
lenard Jan 17, 2020
7bf0369
squeeze spaces in address
lenard Jun 17, 2020
823127b
Fix inventory missing throwing errors
lenard Feb 1, 2021
12cfc4d
#3831 lcbo online inventory scraping
lenard Jan 28, 2022
95daa93
support zero online inventory
lenard Feb 1, 2022
3f966a3
throw error if unable to parse page
lenard Mar 14, 2022
05a72cb
Update inventory scraper for new LCBO site
lenard Apr 8, 2022
1e6f36c
update product scraper for LCBO
lenard Apr 11, 2022
5512496
fix LCBO empty inventory page
lenard Apr 14, 2022
587daf1
remove code2 fron lcbo inventory update
lenard Apr 20, 2022
e0979d1
unable to scrape online inventory, fix image_url
lenard Apr 25, 2022
7a15aef
syntax error
lenard Apr 25, 2022
126ca6c
scrape product url
lenard Apr 26, 2022
0a7cb16
fail safely when wine is 404, exclude phone in inventory to prevent m…
lenard May 4, 2022
5a77598
scrape LTO data
lenard May 10, 2022
6642498
Update product_page.rb
lenard May 11, 2022
33ff1c4
fix sale price scraping
lenard May 11, 2022
6c4fd95
update wine image scraper
lenard May 19, 2022
afd978f
update online_available scraper
lenard Sep 6, 2024
3666078
init bump for graphql
lenard Sep 9, 2024
f6c5ff4
add graphql to scrape for data for online-inventory and upcs
lenard Sep 9, 2024
9e78774
path for schema
lenard Sep 9, 2024
8aa7777
add store_qty
lenard Sep 10, 2024
1a42d26
update product page to use graphQL api, inventory page still scrapes …
lenard May 26, 2025
210536d
raise error if SKU not found
lenard May 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
Gemfile.lock
pkg
.DS_Store
.rvmrc
/spec/support

scrape.rb
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
1.4.0
* added support for saq.com

1.3.0
* added support for bcliquorstores.com

Version 1.2.3

* Updated `ProductPage` to return `RedirectedError` for gift cards which now
Expand Down
5 changes: 4 additions & 1 deletion Gemfile
Original file line number Diff line number Diff line change
@@ -1,2 +1,5 @@
source :rubygems
source 'http://rubygems.org'

gem 'graphql-client'

gemspec
37 changes: 23 additions & 14 deletions Rakefile
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ task :default => :spec

desc 'Start an irb console'
task :console do
system 'irb -I lib -r lcbo'
system 'irb -I lib -r lcbo -r bcl -r saq'
end

desc 'Validates the gemspec'
Expand Down Expand Up @@ -43,19 +43,28 @@ task :package => :gemspec

Rake::TestTask.new(:spec) do |t|
t.libs += %w[lcbo spec]
t.test_files = FileList['spec/**/*.rb']
# t.test_files = FileList['spec/**/*.rb']
# t.test_files = FileList[ENV['SPEC'] || 'spec/**/*.rb']
t.test_files = [
FileList['spec/lcbo_spec.rb'],
FileList['spec/bcl_spec.rb'],
FileList['spec/saq_spec.rb'],
]
t.verbose = true
end

desc 'Download all HTML indicated in YAML assertion files'
task :download_support do
require 'yaml'
require 'open-uri'
product_pages = YAML.load_file('./spec/support/product_pages.yml')
product_pages.each do |spec|
html = open(spec[:uri]).read
File.open("./spec/support/product_pages/#{spec[:file]}", ?w) { |file|
file.print(html)
}
end
end
# desc 'Download all HTML indicated in YAML assertion files'
# task :download_support do
# require 'yaml'
# require 'open-uri'
# pages = YAML.load_file('./spec/support/pages.yml')
# pages.each do |type, uris|
# uris.each_with_index do |uri, i|
# html = open(uri).read
# File.open("./spec/support/#{type}/#{i}.html", ?w) { |file|
# file.print(html)
# }
# end
# end
# end

22 changes: 10 additions & 12 deletions lcbo.gemspec
Original file line number Diff line number Diff line change
@@ -1,22 +1,20 @@
# coding: utf-8
require File.expand_path("../lib/lcbo/version", __FILE__)
require File.expand_path("../lib/version", __FILE__)

Gem::Specification.new do |s|
s.name = 'lcbo'
s.version = LCBO::VERSION
s.version = VERSION
s.platform = Gem::Platform::RUBY
s.authors = ['Carsten Nielsen']
s.email = ['heycarsten@gmail.com']
s.homepage = 'http://github.com/heycarsten/lcbo'
s.summary = %q{A library for parsing HTML pages from http://lcbo.com}
s.authors = ['Carsten Nielsen', 'Lenard Andal', 'Ahmed El-Daly']
s.email = ['lenard@winealign.com']
s.summary = %q{A library for parsing HTML pages from http://lcbo.com, http://bcliquorstores.com, http://saq.com}
s.description = %q{Request and parse product, store, inventory, and product search pages directly from the official LCBO website.}

s.rubyforge_project = 'lcbo'

s.add_dependency 'typhoeus', '~> 0.3.3'
s.add_dependency 'nokogiri', '~> 1.5.0'
s.add_dependency 'unicode_utils', '~> 1.2.2'
s.add_dependency 'stringex', '~> 1.3.0'
s.add_dependency 'typhoeus'
s.add_dependency 'nokogiri'
s.add_dependency 'unicode_utils'
s.add_dependency 'stringex'
s.add_dependency 'graphql-client'

s.files = `git ls-files`.split(?\n)
s.test_files = `git ls-files -- {test,spec}/*`.split(?\n)
Expand Down
23 changes: 23 additions & 0 deletions lib/bcl.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
module BCL

DEFAULT_CONFIG = {
:user_agent => nil, # Use the default User-Agent by default
:max_retries => 3, # Number of times to retry a request that fails
:timeout => 8 # Seconds to wait for a request before timing out
}.freeze

def self.config
reset_config! unless @config
@config
end

def self.reset_config!
@config = DEFAULT_CONFIG.dup
end

end

require 'ext'
require 'bcl/helpers'
require 'crawlkit'
require 'bcl/pages'
39 changes: 39 additions & 0 deletions lib/bcl/helpers.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
module BCL

PAGE_TYPES = {
:product => 'ProductPage',
:product_list => 'ProductListPage',
:store_list => 'StoreListPage',
:store => 'StorePage',
:inventory => 'InventoryPage'
}

def self.page(type)
Object.const_get(PAGE_TYPES[type.to_sym])
end

def self.parse(page_type, response)
page[page_type].parse(response)
end

def self.product(id)
ProductPage.process(:id => id).as_hash
end

def self.store(id)
StorePage.process(:id => id).as_hash
end

def self.inventory(product_id)
InventoryPage.process(:product_id => product_id).as_hash
end

def self.product_list(beginIndex=0)
ProductListPage.process(:beginIndex => beginIndex*BCL::ProductListPage::PER_PAGE).as_hash
end

def self.store_list
StoreListPage.process({}, {}).as_hash
end

end
6 changes: 6 additions & 0 deletions lib/bcl/pages.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
require 'bcl/pages/inventory_page'
require 'bcl/pages/product_list_page'
require 'bcl/pages/product_page'
require 'bcl/pages/store_list_page'
require 'bcl/pages/store_page'
require 'bcl/pages/cities_list_page'
99 changes: 99 additions & 0 deletions lib/bcl/pages/cities_list_page.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
module BCL
class CitiesListPage

include CrawlKit::Page

uri 'http://www.bcliquorstores.com/store/locator/location_data_js'

# default_body_params \
# :STOCK_TYPE_NAME => 'All',
# :ITEM_NAME => '',
# :KEYWORDS => '',
# :ITEM_NUMBER => '',
# :productListingType => '',
# :LIQUOR_TYPE_SHORT_ => '*',
# :CATEGORY_NAME => '*',
# :SUB_CATEGORY_NAME => '*',
# :PRODUCING_CNAME => '*',
# :PRODUCING_REGION_N => '*',
# :UNIT_VOLUME => '*',
# :SELLING_PRICE => '*',
# :LTO_SALES_CODE => 'N',
# :VQA_CODE => 'N',
# :KOSHER_CODE => 'N',
# :VINTAGES_CODE => 'N',
# :VALUE_ADD_SALES_CO => 'N',
# :AIR_MILES_SALES_CO => 'N',
# :language => 'EN',
# :style => 'LCBO.css',
# :sort => 'sortedProduct',
# :order => '1',
# :resultsPerPage => PER_PAGE.to_s,
# :page => '1',
# :action => 'result',
# :sortby => 'sortedProduct',
# :orderby => '',
# :numPerPage => PER_PAGE.to_s

# emits :page do
# body_params[:page].to_i
# end

# emits :final_page do
# @final_page ||= begin
# count = total_products / PER_PAGE
# 0 == (total_products % PER_PAGE) ? count : count + 1
# end
# end

# emits :next_page do
# @next_page ||= begin
# page < final_page ? page + 1 : nil
# end
# end

# emits :total_products do
# @total_products ||= begin
# doc.css('td[width="58%"] font.main_font b')[0].
# text.
# gsub(/\s+/, ' ').
# strip.
# to_i
# end
# end

emits :city_ids do
city_hash.keys.sort
end

emits :city_hash do
result = {}
cities do |city_hash|
city_hash.delete('stores')
result[city_hash['city_id']] = city_hash
end
result
end
# alias_method :as_array, :product_ids

def data
@data ||= JSON.parse(doc.content)
end

def regions
data.each do |region_id, region_hash|
yield region_hash
end
end

def cities
regions do |region_hash|
region_hash['cities'].each do |city_id, city_hash|
yield city_hash
end
end
end


end
end
80 changes: 80 additions & 0 deletions lib/bcl/pages/inventory_page.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
require 'json'
require 'open-uri'

module BCL
class InventoryPage

include CrawlKit::Page

# FIXME: Duplicated BELOW!!!
# uri 'http://m.bcliquorstores.com/m/where_to_buy/{product_id}'
uri 'http://m.bcliquorstores.com/m/api_stores/{product_id}/9999/'

emits :product_id do
query_params[:product_id].to_i
end

emits :inventory_count do
@product_page[:total_units]
end

emits :nid do
@nid
end

emits :board do
"BCL"
end

emits :inventories do
results = []
# doc.css("#listStores li").each do |store_element|
# store_id = store_element.css('a.arrow').attribute('href').value.match(/\/m\/stores\/view\/(\d+)/)[1].to_i
# stock = store_element.css('a.arrow div:nth-of-type(1)')[0].content.strip.match(/Quantity: (\d+)/)[1].to_i
# results << {store_id: store_id, quantity: stock}
# end

json.each do |store_element|
store_id = store_element['serial']
stock = store_element['quantity'].to_i
results << {store_id: store_id, quantity: stock}
end

results
end

def json
@data ||= JSON.parse(doc.content)
end

# emits :xdoc do
# @doc
# end

def parse
return if is_parsed?
return unless @html
fire :before_parse
inv_uri = "http://m.bcliquorstores.com/m/api_stores/#{product_id}/9999/"
@doc = Nokogiri::HTML(CrawlKit::RequestPrototype.new(inv_uri).request().body)
fire :after_parse
self
end

def request
return if @html
fire :before_request

@product_page = BCL.product(product_id)
@nid = @product_page[:nid] rescue product_id
@query_params[:nid] = @nid

@response = request_prototype.request(query_params, body_params)
@html = @response.body

fire :after_request
self
end

end
end
Loading