Journal IJCRT UGC-CARE, UGCCARE( ISSN: 2320-2882 ) | UGC Approved Journal | UGC Journal | UGC CARE Journal | UGC-CARE list, New UGC-CARE Reference List, UGC CARE Journals, International Peer Reviewed Journal and Refereed Journal, ugc approved journal, UGC CARE, UGC CARE list, UGC CARE list of Journal, UGCCARE, care journal list, UGC-CARE list, New UGC-CARE Reference List, New ugc care journal list, Research Journal, Research Journal Publication, Research Paper, Low cost research journal, Free of cost paper publication in Research Journal, High impact factor journal, Journal, Research paper journal, UGC CARE journal, UGC CARE Journals, ugc care list of journal, ugc approved list, ugc approved list of journal, Follow ugc approved journal, UGC CARE Journal, ugc approved list of journal, ugc care journal, UGC CARE list, UGC-CARE, care journal, UGC-CARE list, Journal publication, ISSN approved, Research journal, research paper, research paper publication, research journal publication, high impact factor, free publication, index journal, publish paper, publish Research paper, low cost publication, ugc approved journal, UGC CARE, ugc approved list of journal, ugc care journal, UGC CARE list, UGCCARE, care journal, UGC-CARE list, New UGC-CARE Reference List, UGC CARE Journals, ugc care list of journal, ugc care list 2020, ugc care approved journal, ugc care list 2020, new ugc approved journal in 2020, ugc care list 2021, ugc approved journal in 2021, Scopus, web of Science.
How start New Journal & software Book & Thesis Publications
Submit Your Paper
Login to Author Home
Communication Guidelines

WhatsApp Contact
Click Here

  Published Paper Details:

  Paper Title

DATA SCRAPING FROM VARIOUS DATA RESOURCES

  Authors

  Nimmala. Mrudula,  Moparthy. Rajya Lakshmi,  Makinaboina. Divya,  Manukonda. Sri Sai Lakshmi

  Keywords

TDBM, NoSql, Big Data Analysis, Data Scraping, JSOUP, flume tool

  Abstract


Big data contains large data sets that are so voluminous and complex that traditional data processing applications are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating. The first five important dimensions to big data are characterized as: Volume, Variety, Velocity, veracity and volatile. Big data is generated from multiple sources like social networking sites, sensors, pdf documents. As the big data generated from multiple sources, it is heterogeneous in nature. Big data includes structured, unstructured and semi-structured data. For analyzing the structured data we have Traditional Data Base Management (TDBM). Traditional data bases are inadequate to maintain unstructured and semi-structured. To manage semi-structured data NoSql (Structured Query Language) data bases can be used. Before performing analytics the data should be captured from multiple resources, which is one of the major challenge. Data Scraping is a technique that extracts desired data from big data sources. Big data can be abstracted or scraped by using JSOUP (Third party libraries in Java), web crawlers. The scraped data can automatically loaded to hive engine with the help of flume tool. So in this project we are going to capture big data from various big data sources and then perform analytics on the data which have been scraped from multiple sources.

  IJCRT's Publication Details

  Unique Identification Number - IJCRT1872391

  Paper ID - 184672

  Page Number(s) - 1440-1443

  Pubished in - Volume 6 | Issue 1 | March 2018

  DOI (Digital Object Identifier) -   

  Publisher Name - IJCRT | www.ijcrt.org | ISSN : 2320-2882

  E-ISSN Number - 2320-2882

  Cite this article

  Nimmala. Mrudula,  Moparthy. Rajya Lakshmi,  Makinaboina. Divya,  Manukonda. Sri Sai Lakshmi,   "DATA SCRAPING FROM VARIOUS DATA RESOURCES", International Journal of Creative Research Thoughts (IJCRT), ISSN:2320-2882, Volume.6, Issue 1, pp.1440-1443, March 2018, Available at :http://www.ijcrt.org/papers/IJCRT1872391.pdf

  Share this article

  Article Preview

  Indexing Partners

indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
Call For Paper May 2024
Indexing Partner
ISSN and 7.97 Impact Factor Details


ISSN
ISSN
ISSN: 2320-2882
Impact Factor: 7.97 and ISSN APPROVED
Journal Starting Year (ESTD) : 2013
ISSN
ISSN and 7.97 Impact Factor Details


ISSN
ISSN
ISSN: 2320-2882
Impact Factor: 7.97 and ISSN APPROVED
Journal Starting Year (ESTD) : 2013
ISSN
DOI Details

Providing A Free digital object identifier by DOI.one How to get DOI?
For Reviewer /Referral (RMS) Earn 500 per paper
Our Social Link
Open Access
This material is Open Knowledge
This material is Open Data
This material is Open Content
Indexing Partner

Scholarly open access journals, Peer-reviewed, and Refereed Journals, Impact factor 7.97 (Calculate by google scholar and Semantic Scholar | AI-Powered Research Tool) , Multidisciplinary, Monthly, Indexing in all major database & Metadata, Citation Generator, Digital Object Identifier(DOI)

indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer