How To Scrape A Website Which Redirects For Some Time
Solution 1:
It uses JavaScript to generate some value which is send to page https://koinex.in/cdn-cgi/l/chk_jschl
and get cookie cf_clearance
which is checked by page to skip doss page.
Code can generate value using different parameters and different methods in every requests so it can be easier to use Selenium to get data
from selenium import webdriver
import time
driver = webdriver.Firefox()
driver.get('https://koinex.in/')
time.sleep(8)
tables = driver.find_elements_by_tag_name('table')
for item in tables:
print(item.text)
#print(item.get_attribute("value"))
Result
VOLUME PRICE/ETH
5.2310 64,300.00
0.0930 64,100.00
10.7670 64,025.01
0.0840 64,000.00
0.3300 63,800.00
0.2800 63,701.00
0.4880 63,700.00
0.7060 63,511.00
0.5020 63,501.00
0.1010 63,500.01
1.4850 63,500.00
1.0000 63,254.00
0.0300 63,253.00
VOLUME PRICE/ETH
1.0000 64,379.00
0.0940 64,380.00
0.9710 64,398.00
0.0350 64,399.00
0.7170 64,400.00
0.3000 64,479.00
5.1650 64,480.35
0.0020 64,495.00
0.2000 64,496.00
9.5630 64,500.00
0.4000 64,501.01
0.0400 64,550.00
0.5220 64,600.00
DATE VOLUME PRICE/ETH
31/12/2017, 12:19:29 0.2770 64,300.00
31/12/2017, 12:19:11 0.5000 64,300.00
31/12/2017, 12:18:28 0.3440 64,025.01
31/12/2017, 12:18:28 0.0750 64,026.00
31/12/2017, 12:17:50 0.0010 64,300.00
31/12/2017, 12:17:47 0.0150 64,300.00
31/12/2017, 12:15:45 0.6720 64,385.00
31/12/2017, 12:15:45 0.2000 64,300.00
31/12/2017, 12:15:45 0.0620 64,300.00
31/12/2017, 12:15:45 0.0650 64,199.97
31/12/2017, 12:15:45 0.0010 64,190.00
31/12/2017, 12:15:45 0.0030 64,190.00
31/12/2017, 12:15:25 0.0010 64,190.00
You can also get HTML
from Selenium
and use with BeautifulSoup
soup = BeautifulSoup(driver.page_source)
but Selenium
can get data using xpath
, css selector
and other methods so mostly there is no need to use BeautifulSoup
See documentation: 4. Locating Elements
EDIT: this code uses cookies from Selenium
to load page with requests
and it has no problem with DDoS page.
Problem is that page uses JavaScript to display tables so you can't get them using requests
+BeautifulSoup
. But maybe you will find urls used by JavaScript to get data for tables and then requests
can be useful.
from selenium import webdriver
import time
# --- Selenium ---
url = 'https://koinex.in/'
driver = webdriver.Firefox()
driver.get(url)
time.sleep(8)
#tables = driver.find_elements_by_tag_name('table')
#for item in tables:
# print(item.text)
# --- convert cookies/headers from Selenium to Requests ---
cookies = driver.get_cookies()
for item in cookies:
print('name:', item['name'])
print('value:', item['value'])
print('path:', item['path'])
print('domain:', item['domain'])
print('expiry:', item['expiry'])
print('secure:', item['secure'])
print('httpOnly:', item['httpOnly'])
print('----')
# convert list of dictionaries into dictionary
cookies = {c['name']: c['value'] for c in cookies}
# it has to be full `User-Agent` used in Browser/Selenium (it can't be short 'Mozilla/5.0')
headers = {'User-Agent': driver.execute_script('return navigator.userAgent')}
# --- requests + BeautifulSoup ---
import requests
from bs4 import BeautifulSoup
s = requests.Session()
s.headers.update(headers)
s.cookies.update(cookies)
r = s.get(url)
print(r.text)
soup = BeautifulSoup(r.text, 'html.parser')
tables = soup.find_all('table')
print('tables:', len(tables))
for item in tables:
print(item.get_text())
Post a Comment for "How To Scrape A Website Which Redirects For Some Time"