Hey folks, we will continue making the use of the APIs here in this article. This tutorial is about extracting the movie information from Rotten Tomatoes and IMDb. We will develop this application in .Net Core WPF App, and will use Google Custom Search API, HTML Agility Pack and Newtonsoft JSON. The application will contain two text boxes, in the first textbox we will enter the name of the movies, one in a line, and in the second text box, we will see the ratings of the movies. Before starting the article, let’s see the demo of the application below:
Before writing the code, let’s do the prerequisites first. We’ll need Google Custom Search API and two custom search engines. The first search engine will be used to search in Rotten Tomatoes, and the second search engine will be used to search in IMDb. Let’s get the API and create the search engines.
Head over to this link to get an API for Google Custom Search. Scroll down a little and click on “Get a Key” button as shown in the screenshot below, keep in mind that you need to be logged in via a Google account for this:
A pop-up will be shown up. Click on create a new project and click “NEXT”:
Enter any name as a project name, and click NEXT again:
Congratulations! You’ve got your Custom Search API, copy this key and save it somewhere:
Now, you need to create two custom search engines. Go to this link and click on Add button as shown in the screenshot below:Enter the web address of Rotten Tomatoes in the “sites to search” box, and click on the “CREATE” button, see the screenshot below for the reference:
The custom search engine will be created, and you’ll be greeted with a message. Now, Click on the “Control Panel” button:
From the next page, copy the “Search Engine ID” and save it somewhere”
We have created a custom search engine. Repeat these steps and create another custom search engine for the IMDb. The web address of IMDb is https://www.imdb.com/.
After creating the second custom search engine, so you are now all set up. Let’s start writing the code now.
We will design the layout of the application first. Our layout will have three rows and two columns. Open “MainWindow.xaml” file and design the layout using Grid element. The code below will create the layout as just described:
<Grid.RowDefinitions> <RowDefinition Height="0.2*"/> <RowDefinition Height="0.6*"/> <RowDefinition Height="0.2*"/> </Grid.RowDefinitions> <Grid.ColumnDefinitions> <ColumnDefinition SharedSizeGroup="A" Width="*"/> <ColumnDefinition SharedSizeGroup="A" Width="*"/> </Grid.ColumnDefinitions>
Now, create the text boxes and an action button. The following code will do that for us. Place the code after </Grid.ColumnDefinitions> and before </Grid> elements:
<Label Grid.Row="0" Grid.Column="0" Content="Enter one movie in a line" HorizontalAlignment="Center" VerticalAlignment="Center"/> <TextBox Name="txtMovies" Grid.Row="1" Grid.Column="0" Height="300" Width="350" HorizontalAlignment="Center" VerticalAlignment="Center" AcceptsReturn="True"/> <Label Grid.Row="0" Grid.Column="1" Content="Movie ratings" HorizontalAlignment="Center" VerticalAlignment="Center"/> <TextBox Name="txtMovieRatings" Grid.Row="1" Grid.Column="1" Height="300" Width="350" HorizontalAlignment="Center" VerticalAlignment="Center" IsEnabled="False" VerticalScrollBarVisibility="Auto"/> <Button Name="btnExtract" Grid.Row="2" Grid.Column="0" Width="100" Height="25" Content="Extract" Click="btnExtract_Click"/>
See the complete code of “MainWindow.xaml” file:
<Window x:Class="Extract_Movie_Ratings.MainWindow" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" xmlns:d="http://schemas.microsoft.com/expression/blend/2008" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:local="clr-namespace:Extract_Movie_Ratings" mc:Ignorable="d" Title="Movie ratings" Height="600" Width="800"> <Grid> <Grid.RowDefinitions> <RowDefinition Height="0.2*"/> <RowDefinition Height="0.6*"/> <RowDefinition Height="0.2*"/> </Grid.RowDefinitions> <Grid.ColumnDefinitions> <ColumnDefinition SharedSizeGroup="A" Width="*"/> <ColumnDefinition SharedSizeGroup="A" Width="*"/> </Grid.ColumnDefinitions> <Label Grid.Row="0" Grid.Column="0" Content="Enter one movie in a line" HorizontalAlignment="Center" VerticalAlignment="Center"/> <TextBox Name="txtMovies" Grid.Row="1" Grid.Column="0" Height="300" Width="350" HorizontalAlignment="Center" VerticalAlignment="Center" AcceptsReturn="True"/> <Label Grid.Row="0" Grid.Column="1" Content="Movie ratings" HorizontalAlignment="Center" VerticalAlignment="Center"/> <TextBox Name="txtMovieRatings" Grid.Row="1" Grid.Column="1" Height="300" Width="350" HorizontalAlignment="Center" VerticalAlignment="Center" IsEnabled="False" VerticalScrollBarVisibility="Auto"/> <Button Name="btnExtract" Grid.Row="2" Grid.Column="0" Width="100" Height="25" Content="Extract" Click="btnExtract_Click"/> </Grid> </Window>
The design of our application will look like this:
Our design is ready. Now let’s move on to back-end, but before writing the code, see the flow of the application below written step by step:
- Take the values of the textbox on the left where we entered the movies name line by line.
Convert the values to a string array, meaning, one movie at an index. - For each movie name:
- Call the custom search API by specifying the Rotten Tomatoes Custom Search Engine we just created.
- Get the result from API as JSON by using Newtonsoft JSON.
- Get the Rotten Tomatoes link of the movie.
- Pass the link of the movie to the HTML Agility Pack and ask it to download the link as a web page.
- Crawl the web page, and extract the required information from some specific elements of the page.
- Again call the custom search API by specifying the IMDb Custom Search Engine we just created.
- Get the result from API as JSON by using Newtonsoft JSON.
- Get the IMDb link of the movie.
- Pass the link of the movie to the HTML Agility Pack and ask it to download the link as a web page.
- Crawl the web page, and extract the required information from some specific elements of the page.
- Append the result in the movie rating text box.
Open the “MainWindow.xaml.cs” file and write down the Google Custom Search API key and Search Engine IDs as string variables inside the public partial class MainWindow
: Window:
const string GOOGLE_CUSTOM_SEARCH_KEY = "AIzaSy********xmqL4"; const string GOOGLE_CUSTOM_CX_RM = "010025********6735:zrcx*****42eg"; const string GOOGLE_CUSTOM_CX_IMDB = "0100*******296735:smu*****1smd";
As you can see, the first line is our API key, the second line is holding the Search Engine ID for Rotten Tomatoes, and the third line is the Search Engine ID for IMDb.
Now, inside the event listener of the “Extract” button, we need to split the movies name separated by new line delimiter and save this as a string array, the following single line will do this: (step 1)
string[] movies = txtMovies.Text.Split('\n');
Now, let’s declare the variables used to store the rating information of the movies:
int rtTomatoMeter = 0; int rtPublicRating = 0; int rtNumberOfVotes = 0; double imdbRating = 0.0; int imdbRatingCount = 0;
Now, let’s call the search API. The search API has the following format:
https://www.googleapis.com/customsearch/v1?key=Google Search API KEY&cx=Search Engine ID&q=keywords to search
Calling the Search API (Step 2.1):
string query = String.Format("https://www.googleapis.com/customsearch/v1?key={0}&cx={1}&q={2}", GOOGLE_CUSTOM_SEARCH_KEY, GOOGLE_CUSTOM_CX_RM, movies[i]);
As we will do the steps for all the movies, so we will write the above code inside a for-loop. The movies[i] indicates the ith movie name as the keyword.
Now get the response of the search query (Step 2.2)
JObject response = JObject.Parse(new System.Net.WebClient().DownloadString(query));
The response will have some links. As we are calling the API using a custom Search Engine which will only search in RottenTomatoes.com, the first link will have the link of the movie. The link resides inside a JSON node named as “items”, so let’s get the link from first “items” node: (Step 2.3)
string rtLink = response.SelectToken("items[0].link").ToString();
The token “items[0].link” means, get the value of the “link” key from first “items” node.
Now, we will pass the link to HTML Agility pack and will ask it to download it as a web page: (step 2.4)
HtmlWeb web = new HtmlWeb(); HtmlDocument document = web.Load(rtLink);
Let’s pause here for a moment, and view the page source of a movie from Rotten Tomatoes website. I have placed the screenshot of the page source which have our desired information:
As you can see that the information we are interested in, are placed inside some HTML elements, so we can easily filter out these elements from the entire web page. The following code will extract the information for us: (Step 2.5)
HtmlNode[] nodes = document.DocumentNode.SelectNodes("//small[contains(@class, 'mop-ratings-wrap__text--small')]").ToArray(); rtTomatoMeter = Convert.ToInt32(nodes[0].InnerHtml.ToString()); nodes = document.DocumentNode.SelectNodes("//span[contains(@class, 'mop-ratings-wrap__percentage')]").ToArray(); rtPublicRating = Convert.ToInt32(nodes[1].InnerHtml.ToString().Trim().Replace("%", string.Empty)); nodes = document.DocumentNode.SelectNodes("//strong[contains(@class, 'mop-ratings-wrap__text--small')]").ToArray(); rtNumberOfVotes = Convert.ToInt32(nodes[1].InnerHtml.ToString().Replace("User Ratings: ", string.Empty).Replace(",", string.Empty));
In the above code, in line number 1, the value is translated as “get all those elements which have the name ‘small’ and containing the class ‘mop-ratings-wrap__text–small’.
Now we have successfully got the movie rating information from the Rotten Tomatoes website. We will do the same to get the rating information from IMDb website. The following code will do that for us: (Step 2.6 to 2.10)
query = String.Format("https://www.googleapis.com/customsearch/v1?key={0}&cx={1}&q={2}", GOOGLE_CUSTOM_SEARCH_KEY, GOOGLE_CUSTOM_CX_IMDB, movies[i]); response = JObject.Parse(new System.Net.WebClient().DownloadString(query)); string imdbLink = response.SelectToken("items[0].link").ToString(); web = new HtmlWeb(); document = web.Load(imdbLink); nodes = document.DocumentNode.SelectNodes("//span[contains(@itemprop, 'ratingValue')]").ToArray(); imdbRating = Convert.ToDouble(nodes[0].InnerHtml.ToString()); nodes = document.DocumentNode.SelectNodes("//span[contains(@itemprop, 'ratingCount')]").ToArray(); imdbRatingCount = Convert.ToInt32(nodes[0].InnerHtml.ToString().Replace(",", string.Empty));
Now, let’s append the results for each movie to our movie rating textbox. The following line of code will do that: (Step 2.11)
txtMovieRatings.Text += movies[i] + "\n\tRotten Tomatoes\n\t\tTomato meter: " + rtTomatoMeter + "\n\t\tPublic rating: " + rtPublicRating + "%\n\t\tNumber of votes: " + rtNumberOfVotes + "\n\tIMDb\n\t\tRating: " + imdbRating + "\n\t\tRating count: " + imdbRatingCount + "\n";
The complete code for “MainWindow.xaml.cs” is listed below:
using HtmlAgilityPack; using Newtonsoft.Json.Linq; using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Windows; using System.Windows.Controls; using System.Windows.Data; using System.Windows.Documents; using System.Windows.Input; using System.Windows.Media; using System.Windows.Media.Imaging; using System.Windows.Navigation; using System.Windows.Shapes; namespace Extract_Movie_Ratings { /// <summary> /// Interaction logic for MainWindow.xaml /// </summary> public partial class MainWindow : Window { const string GOOGLE_CUSTOM_SEARCH_KEY = "AIzaSy********xmqL4"; const string GOOGLE_CUSTOM_CX_RM = "010025********6735:zrcx*****42eg"; const string GOOGLE_CUSTOM_CX_IMDB = "0100*******296735:smu*****1smd"; public MainWindow() { InitializeComponent(); } private void btnExtract_Click(object sender, RoutedEventArgs e) { string[] movies = txtMovies.Text.Split('\n'); for (int 0; i < movies.Count(); i++) { int rtTomatoMeter = 0; int rtPublicRating = 0; int rtNumberOfVotes = 0; double imdbRating = 0.0; int imdbRatingCount = 0; //Extracting ratings from Rotten Tomatoes string query = String.Format("https://www.googleapis.com/customsearch/v1?key={0}&cx={1}&q={2}", GOOGLE_CUSTOM_SEARCH_KEY, GOOGLE_CUSTOM_CX_RM, movies[i]); JObject response = JObject.Parse(new System.Net.WebClient().DownloadString(query)); string rtLink = response.SelectToken("items[0].link").ToString(); HtmlWeb web = new HtmlWeb(); HtmlDocument document = web.Load(rtLink); HtmlNode[] nodes = document.DocumentNode.SelectNodes("//small[contains(@class, 'mop-ratings-wrap__text--small')]").ToArray(); rtTomatoMeter = Convert.ToInt32(nodes[0].InnerHtml.ToString()); nodes = document.DocumentNode.SelectNodes("//span[contains(@class, 'mop-ratings-wrap__percentage')]").ToArray(); rtPublicRating = Convert.ToInt32(nodes[1].InnerHtml.ToString().Trim().Replace("%", string.Empty)); nodes = document.DocumentNode.SelectNodes("//strong[contains(@class, 'mop-ratings-wrap__text--small')]").ToArray(); rtNumberOfVotes = Convert.ToInt32(nodes[1].InnerHtml.ToString().Replace("User Ratings: ", string.Empty).Replace(",", string.Empty)); //Extracting ratings from IMDb query = String.Format("https://www.googleapis.com/customsearch/v1?key={0}&cx={1}&q={2}", GOOGLE_CUSTOM_SEARCH_KEY, GOOGLE_CUSTOM_CX_IMDB, movies[i]); response = JObject.Parse(new System.Net.WebClient().DownloadString(query)); string imdbLink = response.SelectToken("items[0].l ink").ToString(); web = new HtmlWeb(); document = web.Load(imdbLink); nodes = document.DocumentNode.SelectNodes("//span[contains(@itemprop, 'ratingValue')]").ToArray(); imdbRating = Convert.ToDouble(nodes[0].InnerHtml.ToString()); nodes = document.DocumentNode.SelectNodes("//span[contains(@itemprop, 'ratingCount')]").ToArray(); imdbRatingCount = Convert.ToInt32(nodes[0].InnerHtml.ToString().Replace(",", string.Empty)); //Placing the result in the rating text box txtMovieRatings.Text += movies[i] + "\n\tRotten Tomatoes\n\t\tTomato meter: " + rtTomatoMeter + "\n\t\tPublic rating: " + rtPublicRating + "%\n\t\tNumber of votes: " + rtNumberOfVotes + "\n\tIMDb\n\t\tRating: " + imdbRating + "\n\t\tRating count: " + imdbRatingCount + "\n"; } } } }
That’s all for this tutorial. See you in the next article. Please keep in mind that the API keys and custom Search Engine IDs used in this article will not work for you; you have to get your key and Search Engine IDs. You can download the entire source code here.
Here’re some more related articles:
– WHAT STATS AND SURVEYS ARE SAYING ABOUT .NET CORE IN 2020
– A COMPLETE GUIDE TO SECURE YOUR ASP.NET CORE WEB API
– 12 USEFUL ASP.NET CORE 3 LIBRARIES EVERY DEVELOPER SHOULD KNOW