{"id":932,"date":"2020-04-08T18:49:36","date_gmt":"2020-04-08T10:49:36","guid":{"rendered":"http:\/\/cleardatascience.com\/?p=932"},"modified":"2020-04-09T10:13:41","modified_gmt":"2020-04-09T02:13:41","slug":"build-your-own-best-fit-data-repository","status":"publish","type":"post","link":"https:\/\/cleardatascience.com\/en\/build-your-own-best-fit-data-repository\/","title":{"rendered":"Build Your Own \u201cBest-fit\u201d Data Repository"},"content":{"rendered":"<p>CDS technical team is always using a mixture of commercial licensed and open source software together for our solutions and professional services to clients.<\/p>\n<p>With the rapid development of open source technology, there are more choices available to build data warehouse and \/ or Big data platform.\u00a0 \u00a0There are several different real application examples available with our technical team members:<\/p>\n<ol>\n<li>Modern Data Warehouse<\/li>\n<li>NoSQL Database for IoT<\/li>\n<li>Big Data Repository<\/li>\n<li>Data Lake<\/li>\n<\/ol>\n<p><strong>Modern Data Warehouse<\/strong><\/p>\n<p>In this year, we have used PostgreSQL database to build a modern data warehouse for an asset management company.\u00a0\u00a0 The PostgreSQL database is running on a cluster with SSD storage for the speed to store the commodities\u2019 prices and transaction history. \u00a0\u00a0For the modern data warehouse, it is better to have the possibility to integrate with Apache Cassandra \/ Hadoop for semi-structured or even unstructured data as a future proven solution and details available in the later part of this article.\u00a0 PostgreSQL is a very good solution and also the MariaDB is another possible candidate with Enterprise class technical support subscription services available.\u00a0 However, the latest version MariaDB is no longer official supported Apache Cassandra NoSQL direct connection.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-940 size-full aligncenter\" src=\"https:\/\/cleardatascience.comwp-content\/uploads\/2020\/04\/cleardatascience-moderndata_warehouse.png\" alt=\"\" width=\"624\" height=\"351\" srcset=\"https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-moderndata_warehouse.png 624w, https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-moderndata_warehouse-300x169.png 300w, https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-moderndata_warehouse-150x84.png 150w\" sizes=\"auto, (max-width: 624px) 100vw, 624px\" \/><\/p>\n<p style=\"text-align: center;\">A Typical Architecture for a Data Warehouse<\/p>\n<p><strong>NoSQL Database for IoT<\/strong><\/p>\n<p>We are helping a courier corporation to track their tracks on the road with a GPS sensor module for every truck.\u00a0 Then, the route for each truck is being recorded and upload to an Apache Cassandra NoSQL database for further analysis for the performance of each individual driver and the traffic conditions.\u00a0 For NoSQL database, we are also using Redis for storing temperature and humidity information for different data center or computer lab owners for the air-conditioning monitoring and auto-adjustment by system integration with their air-conditioning system.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-938 aligncenter\" src=\"https:\/\/cleardatascience.comwp-content\/uploads\/2020\/04\/cleardatascience-GPS_sensor.png\" alt=\"\" width=\"246\" height=\"232\" srcset=\"https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-GPS_sensor.png 246w, https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-GPS_sensor-150x141.png 150w\" sizes=\"auto, (max-width: 246px) 100vw, 246px\" \/><\/p>\n<p style=\"text-align: center;\">GPS Sensor<\/p>\n<p><strong>Big Data Repository<\/strong><\/p>\n<p>In order to handle large amount of unstructured data, one of the most efficient way is to store it into the Apache Hadoop.\u00a0 We are helping different clients mostly retailers to store multiple social media logs to apply sentiment analysis for better customer services.\u00a0 They are feeding the unstructured data from different social media to the HDFS.\u00a0 Then, they are streaming the data with Spark to do near real-time analytics.\u00a0 This is important to have prompt responses to the customers both online &amp; offline, then leading to better answers in logistics and inventory management.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-936 size-full aligncenter\" src=\"https:\/\/cleardatascience.comwp-content\/uploads\/2020\/04\/cleardatascience-BigData_repository.png\" alt=\"\" width=\"705\" height=\"342\" srcset=\"https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-BigData_repository.png 705w, https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-BigData_repository-300x146.png 300w, https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-BigData_repository-150x73.png 150w, https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-BigData_repository-640x310.png 640w\" sizes=\"auto, (max-width: 705px) 100vw, 705px\" \/><\/p>\n<p style=\"text-align: center;\">Big Data Repository Design &#8211; available for near real-time analytics<\/p>\n<p><strong>Data Lake<\/strong><\/p>\n<p>For a large scale corporation, it is possible for them to have hundreds of systems and data silos with different owners.\u00a0 It is difficult to analyze their data as a whole.\u00a0 There are several options to build a data lake:<\/p>\n<ul>\n<li>Apache Hadoop \u2013 a Big Data platform to answer everything in a hard way<\/li>\n<li>Data Virtualization \u2013 fast answer without any fundamental changes for the original data silos<\/li>\n<li>Database \/ Big Data direct linkage \u2013 some database like PostgreSQL could connect to Apache Hadoop<\/li>\n<\/ul>\n<p>If you would like to save time and effort, it is better to go for Data Virtualization.\u00a0 However, if there is just a few number of source systems (less than 3), it is not too hard to have the Apache Hadoop in place. \u00a0Another option is running a database with ability to integrate with Hadoop or Cassandra, such as PostgreSQL.\u00a0 We are helping one of our client in luxury trading to build their Data Lake on top of Apache Hadoop.\u00a0 All of these 3 options are being implemented depending on different clients \u2013 unique requirements and environmental constraints.\u00a0\u00a0\u00a0 We have Fortune 500 insurance corporation using Apache Hadoop, large manufacturers using Database \u2013 Big Data direct attach and a statutory body using data virtualization.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-934 size-full aligncenter\" src=\"https:\/\/cleardatascience.comwp-content\/uploads\/2020\/04\/cleardatascience-ApacheHadoopEcosystem.png\" alt=\"\" width=\"624\" height=\"489\" srcset=\"https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-ApacheHadoopEcosystem.png 624w, https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-ApacheHadoopEcosystem-300x235.png 300w, https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-ApacheHadoopEcosystem-150x118.png 150w\" sizes=\"auto, (max-width: 624px) 100vw, 624px\" \/><\/p>\n<p style=\"text-align: center;\">Apache Hadoop (Ecosystem)<\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-944\" src=\"https:\/\/cleardatascience.comwp-content\/uploads\/2020\/04\/cleardatascience-DataVirtualization.png\" alt=\"\" width=\"807\" height=\"447\" srcset=\"https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-DataVirtualization.png 807w, https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-DataVirtualization-300x166.png 300w, https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-DataVirtualization-768x425.png 768w, https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-DataVirtualization-150x83.png 150w, https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-DataVirtualization-640x354.png 640w\" sizes=\"auto, (max-width: 807px) 100vw, 807px\" \/><\/p>\n<p style=\"text-align: center;\">Data Virtualization<\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-942 size-full\" src=\"https:\/\/cleardatascience.comwp-content\/uploads\/2020\/04\/cleardatascience-PostgreSQL_hadoop.png\" alt=\"\" width=\"557\" height=\"245\" srcset=\"https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-PostgreSQL_hadoop.png 557w, https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-PostgreSQL_hadoop-300x132.png 300w, https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-PostgreSQL_hadoop-150x66.png 150w\" sizes=\"auto, (max-width: 557px) 100vw, 557px\" \/><\/p>\n<p style=\"text-align: center;\">Database \/ Big Data direct linkage (PostgreSQL as an example)<\/p>\n<p><strong>Conclusion<\/strong><\/p>\n<p>There are some misconception of handling and storing data.\u00a0 For instance, there are lots of people taking Apache Hadoop to everywhere as the repository for data analytics.\u00a0 With the above examples, we are sharing different use cases with best practices and proper application methods.\u00a0 If one tool is fit for everything, it is totally ridiculous to have so many tools and technologies available with huge usage.\u00a0 Another concern is related to on-premises versus cloud service.\u00a0 For data storage and data analytic, cloud services are always expensive and extremely difficult to migrate.\u00a0 It is really important to have a comprehensive review being making the decision.\u00a0\u00a0 We would like to suggest \u201cPrivate Cloud\u201d or on-premises rather than taking any public cloud services for the data science platform in a reasonable production scale.<\/p>\n<p>If you would like raise a question or discuss with us, please <a href=\"https:\/\/cleardatascience.com\/en\/contact-us\/\" target=\"_blank\" rel=\"noopener noreferrer\">contact us at here<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>CDS technical team is always using a mixture of commercial licensed and open source software together for our solutions and [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":934,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"nf_dc_page":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[6],"tags":[78,76,54,21,79,80,81,77],"class_list":["post-932","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cds-news","tag-analytics","tag-apache-hadoop","tag-data-lake","tag-data-science","tag-data-warehouse","tag-iot","tag-nosql","tag-real-time"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.7 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Build Your Own \u201cBest-fit\u201d Data Repository - Clear Data Science Limited<\/title>\n<meta name=\"description\" content=\"With the rapid development of open source technology, it is possible to use open source platform to build data warehouse and \/ or Big data platform.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/cleardatascience.com\/en\/build-your-own-best-fit-data-repository\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Build Your Own \u201cBest-fit\u201d Data Repository - Clear Data Science Limited\" \/>\n<meta property=\"og:description\" content=\"With the rapid development of open source technology, it is possible to use open source platform to build data warehouse and \/ or Big data platform.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/cleardatascience.com\/en\/build-your-own-best-fit-data-repository\/\" \/>\n<meta property=\"og:site_name\" content=\"Clear Data Science Limited\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cleardatasciencelimited\/\" \/>\n<meta property=\"article:published_time\" content=\"2020-04-08T10:49:36+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2020-04-09T02:13:41+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-ApacheHadoopEcosystem.png\" \/>\n\t<meta property=\"og:image:width\" content=\"624\" \/>\n\t<meta property=\"og:image:height\" content=\"489\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/cleardatascience.com\\\/en\\\/build-your-own-best-fit-data-repository\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/cleardatascience.com\\\/en\\\/build-your-own-best-fit-data-repository\\\/\"},\"author\":{\"name\":\"admin\",\"@id\":\"https:\\\/\\\/cleardatascience.com\\\/zh-hant\\\/#\\\/schema\\\/person\\\/06529788d7f95b5acac977cf15b07d89\"},\"headline\":\"Build Your Own \u201cBest-fit\u201d Data Repository\",\"datePublished\":\"2020-04-08T10:49:36+00:00\",\"dateModified\":\"2020-04-09T02:13:41+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/cleardatascience.com\\\/en\\\/build-your-own-best-fit-data-repository\\\/\"},\"wordCount\":765,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/cleardatascience.com\\\/zh-hant\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/cleardatascience.com\\\/en\\\/build-your-own-best-fit-data-repository\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/cleardatascience.com\\\/wp-content\\\/uploads\\\/2020\\\/04\\\/cleardatascience-ApacheHadoopEcosystem.png\",\"keywords\":[\"Analytics\",\"Apache Hadoop\",\"data lake\",\"Data Science\",\"Data Warehouse\",\"IoT\",\"NoSQL\",\"Real-time\"],\"articleSection\":[\"CDS News\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/cleardatascience.com\\\/en\\\/build-your-own-best-fit-data-repository\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/cleardatascience.com\\\/en\\\/build-your-own-best-fit-data-repository\\\/\",\"url\":\"https:\\\/\\\/cleardatascience.com\\\/en\\\/build-your-own-best-fit-data-repository\\\/\",\"name\":\"Build Your Own \u201cBest-fit\u201d Data Repository - Clear Data Science Limited\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/cleardatascience.com\\\/zh-hant\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/cleardatascience.com\\\/en\\\/build-your-own-best-fit-data-repository\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/cleardatascience.com\\\/en\\\/build-your-own-best-fit-data-repository\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/cleardatascience.com\\\/wp-content\\\/uploads\\\/2020\\\/04\\\/cleardatascience-ApacheHadoopEcosystem.png\",\"datePublished\":\"2020-04-08T10:49:36+00:00\",\"dateModified\":\"2020-04-09T02:13:41+00:00\",\"description\":\"With the rapid development of open source technology, it is possible to use open source platform to build data warehouse and \\\/ or Big data platform.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/cleardatascience.com\\\/en\\\/build-your-own-best-fit-data-repository\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/cleardatascience.com\\\/en\\\/build-your-own-best-fit-data-repository\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/cleardatascience.com\\\/en\\\/build-your-own-best-fit-data-repository\\\/#primaryimage\",\"url\":\"https:\\\/\\\/cleardatascience.com\\\/wp-content\\\/uploads\\\/2020\\\/04\\\/cleardatascience-ApacheHadoopEcosystem.png\",\"contentUrl\":\"https:\\\/\\\/cleardatascience.com\\\/wp-content\\\/uploads\\\/2020\\\/04\\\/cleardatascience-ApacheHadoopEcosystem.png\",\"width\":624,\"height\":489},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/cleardatascience.com\\\/en\\\/build-your-own-best-fit-data-repository\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/cleardatascience.com\\\/en\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Build Your Own \u201cBest-fit\u201d Data Repository\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/cleardatascience.com\\\/zh-hant\\\/#website\",\"url\":\"https:\\\/\\\/cleardatascience.com\\\/zh-hant\\\/\",\"name\":\"Clear Data Science Limited\",\"description\":\"Clear Data Clear Picture\",\"publisher\":{\"@id\":\"https:\\\/\\\/cleardatascience.com\\\/zh-hant\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/cleardatascience.com\\\/zh-hant\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/cleardatascience.com\\\/zh-hant\\\/#organization\",\"name\":\"Clear Data Science Limited\",\"url\":\"https:\\\/\\\/cleardatascience.com\\\/zh-hant\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/cleardatascience.com\\\/zh-hant\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/cleardatascience.com\\\/wp-content\\\/uploads\\\/2019\\\/03\\\/CDS-Logo-small-h02.png\",\"contentUrl\":\"https:\\\/\\\/cleardatascience.com\\\/wp-content\\\/uploads\\\/2019\\\/03\\\/CDS-Logo-small-h02.png\",\"width\":165,\"height\":45,\"caption\":\"Clear Data Science Limited\"},\"image\":{\"@id\":\"https:\\\/\\\/cleardatascience.com\\\/zh-hant\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/cleardatasciencelimited\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/16194855\",\"https:\\\/\\\/www.youtube.com\\\/channel\\\/UCS3jQw-3EZvmWkLr8ZyDHFw\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/cleardatascience.com\\\/zh-hant\\\/#\\\/schema\\\/person\\\/06529788d7f95b5acac977cf15b07d89\",\"name\":\"admin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/c3aad60a0de4528a5822f5362ebf40a45345a6f6670b6222da23f3930722bf74?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/c3aad60a0de4528a5822f5362ebf40a45345a6f6670b6222da23f3930722bf74?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/c3aad60a0de4528a5822f5362ebf40a45345a6f6670b6222da23f3930722bf74?s=96&d=mm&r=g\",\"caption\":\"admin\"},\"url\":\"https:\\\/\\\/cleardatascience.com\\\/en\\\/author\\\/archsolutionltd\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Build Your Own \u201cBest-fit\u201d Data Repository - Clear Data Science Limited","description":"With the rapid development of open source technology, it is possible to use open source platform to build data warehouse and \/ or Big data platform.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/cleardatascience.com\/en\/build-your-own-best-fit-data-repository\/","og_locale":"en_US","og_type":"article","og_title":"Build Your Own \u201cBest-fit\u201d Data Repository - Clear Data Science Limited","og_description":"With the rapid development of open source technology, it is possible to use open source platform to build data warehouse and \/ or Big data platform.","og_url":"https:\/\/cleardatascience.com\/en\/build-your-own-best-fit-data-repository\/","og_site_name":"Clear Data Science Limited","article_publisher":"https:\/\/www.facebook.com\/cleardatasciencelimited\/","article_published_time":"2020-04-08T10:49:36+00:00","article_modified_time":"2020-04-09T02:13:41+00:00","og_image":[{"width":624,"height":489,"url":"https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-ApacheHadoopEcosystem.png","type":"image\/png"}],"author":"admin","twitter_card":"summary_large_image","twitter_misc":{"Written by":"admin","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/cleardatascience.com\/en\/build-your-own-best-fit-data-repository\/#article","isPartOf":{"@id":"https:\/\/cleardatascience.com\/en\/build-your-own-best-fit-data-repository\/"},"author":{"name":"admin","@id":"https:\/\/cleardatascience.com\/zh-hant\/#\/schema\/person\/06529788d7f95b5acac977cf15b07d89"},"headline":"Build Your Own \u201cBest-fit\u201d Data Repository","datePublished":"2020-04-08T10:49:36+00:00","dateModified":"2020-04-09T02:13:41+00:00","mainEntityOfPage":{"@id":"https:\/\/cleardatascience.com\/en\/build-your-own-best-fit-data-repository\/"},"wordCount":765,"commentCount":0,"publisher":{"@id":"https:\/\/cleardatascience.com\/zh-hant\/#organization"},"image":{"@id":"https:\/\/cleardatascience.com\/en\/build-your-own-best-fit-data-repository\/#primaryimage"},"thumbnailUrl":"https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-ApacheHadoopEcosystem.png","keywords":["Analytics","Apache Hadoop","data lake","Data Science","Data Warehouse","IoT","NoSQL","Real-time"],"articleSection":["CDS News"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/cleardatascience.com\/en\/build-your-own-best-fit-data-repository\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/cleardatascience.com\/en\/build-your-own-best-fit-data-repository\/","url":"https:\/\/cleardatascience.com\/en\/build-your-own-best-fit-data-repository\/","name":"Build Your Own \u201cBest-fit\u201d Data Repository - Clear Data Science Limited","isPartOf":{"@id":"https:\/\/cleardatascience.com\/zh-hant\/#website"},"primaryImageOfPage":{"@id":"https:\/\/cleardatascience.com\/en\/build-your-own-best-fit-data-repository\/#primaryimage"},"image":{"@id":"https:\/\/cleardatascience.com\/en\/build-your-own-best-fit-data-repository\/#primaryimage"},"thumbnailUrl":"https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-ApacheHadoopEcosystem.png","datePublished":"2020-04-08T10:49:36+00:00","dateModified":"2020-04-09T02:13:41+00:00","description":"With the rapid development of open source technology, it is possible to use open source platform to build data warehouse and \/ or Big data platform.","breadcrumb":{"@id":"https:\/\/cleardatascience.com\/en\/build-your-own-best-fit-data-repository\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/cleardatascience.com\/en\/build-your-own-best-fit-data-repository\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/cleardatascience.com\/en\/build-your-own-best-fit-data-repository\/#primaryimage","url":"https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-ApacheHadoopEcosystem.png","contentUrl":"https:\/\/cleardatascience.com\/wp-content\/uploads\/2020\/04\/cleardatascience-ApacheHadoopEcosystem.png","width":624,"height":489},{"@type":"BreadcrumbList","@id":"https:\/\/cleardatascience.com\/en\/build-your-own-best-fit-data-repository\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/cleardatascience.com\/en\/"},{"@type":"ListItem","position":2,"name":"Build Your Own \u201cBest-fit\u201d Data Repository"}]},{"@type":"WebSite","@id":"https:\/\/cleardatascience.com\/zh-hant\/#website","url":"https:\/\/cleardatascience.com\/zh-hant\/","name":"Clear Data Science Limited","description":"Clear Data Clear Picture","publisher":{"@id":"https:\/\/cleardatascience.com\/zh-hant\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/cleardatascience.com\/zh-hant\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/cleardatascience.com\/zh-hant\/#organization","name":"Clear Data Science Limited","url":"https:\/\/cleardatascience.com\/zh-hant\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/cleardatascience.com\/zh-hant\/#\/schema\/logo\/image\/","url":"https:\/\/cleardatascience.com\/wp-content\/uploads\/2019\/03\/CDS-Logo-small-h02.png","contentUrl":"https:\/\/cleardatascience.com\/wp-content\/uploads\/2019\/03\/CDS-Logo-small-h02.png","width":165,"height":45,"caption":"Clear Data Science Limited"},"image":{"@id":"https:\/\/cleardatascience.com\/zh-hant\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cleardatasciencelimited\/","https:\/\/www.linkedin.com\/company\/16194855","https:\/\/www.youtube.com\/channel\/UCS3jQw-3EZvmWkLr8ZyDHFw"]},{"@type":"Person","@id":"https:\/\/cleardatascience.com\/zh-hant\/#\/schema\/person\/06529788d7f95b5acac977cf15b07d89","name":"admin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/c3aad60a0de4528a5822f5362ebf40a45345a6f6670b6222da23f3930722bf74?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/c3aad60a0de4528a5822f5362ebf40a45345a6f6670b6222da23f3930722bf74?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/c3aad60a0de4528a5822f5362ebf40a45345a6f6670b6222da23f3930722bf74?s=96&d=mm&r=g","caption":"admin"},"url":"https:\/\/cleardatascience.com\/en\/author\/archsolutionltd\/"}]}},"_links":{"self":[{"href":"https:\/\/cleardatascience.com\/en\/wp-json\/wp\/v2\/posts\/932","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cleardatascience.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cleardatascience.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cleardatascience.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cleardatascience.com\/en\/wp-json\/wp\/v2\/comments?post=932"}],"version-history":[{"count":4,"href":"https:\/\/cleardatascience.com\/en\/wp-json\/wp\/v2\/posts\/932\/revisions"}],"predecessor-version":[{"id":956,"href":"https:\/\/cleardatascience.com\/en\/wp-json\/wp\/v2\/posts\/932\/revisions\/956"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cleardatascience.com\/en\/wp-json\/wp\/v2\/media\/934"}],"wp:attachment":[{"href":"https:\/\/cleardatascience.com\/en\/wp-json\/wp\/v2\/media?parent=932"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cleardatascience.com\/en\/wp-json\/wp\/v2\/categories?post=932"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cleardatascience.com\/en\/wp-json\/wp\/v2\/tags?post=932"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}