Building a search page over large documente dataset based in elasticsearch

June 7, 2020
By

[This article was first published on各式各样的东西, and kindly contributed to188bet app]. (You can report issue about the content on this page这里)
要共享R-博客内容?188bet app188bet app 如果你有一个博客,或这里如果你不这样做。

在这周的数据科学锁定大家一直在寻找一些数据集玩耍。在我的具体情况,我发现ParlSpeech V21。ParlSpeech V2 contains complete full-text vectors of more than 6.3 million parliamentary speeches in the key legislative chambers of Austria, the Czech Republic, Germany, Denmark, the Netherlands, New Zealand, Spain, Sweden, and the United Kingdom, covering periods between 21 and 32 years. Meta-data include information on date, speaker, party, and partially agenda item under which a speech was held. The accompanying release note provides a more detailed guide to the data.

This dataset reminded me ofVERBA VOLANTfrom Civio, in this beatufil data visualization you can search words in the a dataset containing the news broadcasted in TVE, the spanish public television.

然后,我想过做类似的事情,并在金宝搏网址其公共库中我发现,他们使用Elastic,Elasticis an open source engine for documental texts.Elastic报价their own cloud service (free during the first 14 days). So I decided to get my hands dirty withElasticR

有一次,我建立了我自己Elastic Cloudaccount, I populated it usingelasticpackage in R

discursos <- readRDS("Corp_Congreso_V2.rds") x <- connect(host = "4fa108dc0e82435d94e6b71b7723d0be.uksouth.azure.elastic-cloud.com",port = 9243,user = "elastic",pwd = "*****************",transport_schema = "https") docs_bulk(x,discursos,index = "speechnumber")

并建立弹性能力一个非常简单的应用程序光泽(这么简单,基本上it's剥去)

ui.R

库(弹性)库(dplyr)主机< -  “4fa108dc0e82435d94e6b71b7723d0be.uksouth.azure.elastic-cloud.com” 端口< -  9243 usuario < -  “弹性的” 密码< - “**************************”×< - 连接(主机=主机,端口=端口,用户= usuario,PWD =密码,transport_schema = “HTTPS”)#定义UI应用程序绘制一直方图shinyUI(fluidPage(#应用程序标题titlePanel(“Buscador德términos烯discursos德尔全国大会洛斯Diputados山岛弹性的”),#边栏与仓为textInput(“termino”的数目的滑块输入,“Buscar”,值=“”),dataTableOutput( '表')))

server.R

library(shiny) library(DT) # Define server logic required to draw a histogram shinyServer(function(input, output) { output$table <- renderDataTable({ # if(input$termino!=""){ res <- elastic::Search(conn = x,index = "speechnumber",q = sprintf("text:%s",input$termino),source=c("speaker","date","text"),asdf = TRUE,size = 10000) # } else { #res <- elastic::Search(conn = x,index = "speechnumber",q = "text:*",source=c("speaker","date","text"),asdf = TRUE) # } df <- res$hits$hits df <- df %>% dplyr::select(starts_with("_source")) colnames(df) <- gsub(pattern = "_source.",replacement = "",x = colnames(df)) df }) })

I’ve deployed my (very simple) application toshinyapps,你可以在这里找到。I thinkElastic
Kibanaare amazing tools and I’ve enjoyed the time I’ve spent learning a little bit of them.


  1. Rauh, Christian; Schwalbach, Jan, 2020, “The ParlSpeech V2 data set: Full-text corpora of 6.3 million parliamentary speeches in the key legislative chambers of nine representative democracies”,https://doi.org/10.7910/DVN/L4OAKN, Harvard Dataverse, V1

To发表评论for the author, please follow the link and comment on their blog:各式各样的东西

188bet appR-bloggers.com报价daily e-mail updates金宝搏网址 R新闻和教程金宝搏网址金宝搏网址 和许多另外一些er topics.Click here if you're looking to post or find an R/data-science job
要共享R-博客内容?188bet app188bet app 如果你有一个博客,或这里如果你不这样做。



If you got this far, why not订阅更新从网站?选择你的味道:e-mail,twitter,1188bet app, orfacebook。。。

Comments are closed.

搜索R-博客188bet app

赞助商

从来没有错过一个更新!
Subscribe to R-bloggers受到
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)