- Data-mining functions: Here are three examples of data mining applications. Match each application to one of the three data-mining functions. Then, for each particular application, elaborate potential variables (features/attributes), techniques (algorithms/models) and evaluation criteria. [15 points]
 A. A credit card company tries to distinguish fraud transactions from thousands of normal transactions.
 B. A supermarket analyzes customersâ transaction records and find out items that are often purchased together.
 C. A furniture retailer tries to identify its target customers by segmenting the market into groups of similar people.
 Data-mining functions:
 Association mining: ()
 ⢠Variables (features/attributes)?
 ⢠Techniques (algorithms/models)? FP-Growth, Create Association Rules
 ⢠Evaluation criteria? Support, confidence, lift
 Cluster analysis: ()
 ⢠Variables (features/attributes)?
 ⢠Techniques (algorithms/models)? Clustering
 ⢠Evaluation criteria?
 Classification: ()
 ⢠Variables (features/attributes)?
 ⢠Techniques (algorithms/models)? Decision Tree, Naive Bayes(Kernel), Deep Learning
 ⢠Evaluation criteria? Accuracy
- Text crawling and scraping: We learned how to use regular expression to define web crawling rules and how to use Xpath to extract information from web pages. Suppose you are interested in studying the trend of data mining techniques. https://www.kdnuggets.com/ is a good website that publishes news and opinions of data mining. You want to collect all news, opinions, tutorials, etc. that are published in 2020 from this website. https://www.kdnuggets.com/2020/index.html is a good starting point. [10 points]
 a. Use regular expression to define your crawling rules. Please also explain the meaning of your regular expression.
 b. Design two Xpath queries. One is used to extract titles from the web pages and the other is used to extract the article bodies from the web pages.
- Text representations: The following questions examine the text processing operations required for different text mining tasks. Consider the following three text-mining tasks. For each task, give a list of preprocessing operators (tf-idf vs binary, stemming, stopwords, ect.) you will use and explain why you choose these operators. [15 points]
 a. Finding hot topics from news articles.
 b. Predicting the categories of news articles.
 c. Extracting biomedical relations (e.g., protein A activates protein B) from scientific literature.
 â
- Business applications: Suppose that you work for AT&T, which runs customer discussion groups on its website. There are active discussion happening simultaneously â too many for the company to monitor them all.
 a. How can the company get a general understanding of what is being discussed, and how it changes from week to week? Please describe your text mining solution including choices of text preprocessing and data mining techniques (e.g., association rule, k-means, decision tree, etc.). [8 points]
b. Each discussion page has slots for two ads. The company would like to select ads that are good match to the page. Assume that there are many ads. How are the best two ads for this web page are selected? Please describe your text mining solution including choices of text preprocessing and data mining techniques (e.g., association rule, k-means, decision tree, etc.). [8 points]
c. After observing the effectiveness of your solution for a while, the company realizes that advertising revenue could be improved if ad selection is tuned differently for people based on their primary interest in using the website. There are five types of primary interest: âphone hardware,â âphone GUI,â âphone apps,â âcoverage,â and âprice.â For a particular user, how can you use a personâs profiles (e.g., age, gender) and behavior (e.g., posts, comments, reading history) to predict which type of user he or she is? Please describe your text mining solution including choices of text preprocessing and data mining techniques (e.g., association rule, k-means, decision tree, etc.). [8 points]
Sample Solution
 Overall, high-end brands have also realised the importance of connecting with influential people in order to relate to their audience in a personal way. The only difference is that high-end brands like Burberry use relatable celebrities rather than bloggers and social media influencers. I will now be comparing the use of social media with a high-end brand and a high-street brand, in this chapter, to see how they are similar or what differences there are.  CHAPTER 3: Pretty Little Thing. Pretty little thing is an online clothing website that is known for its affordable and fashionable clothing. The site is known for its fast fashion and trend capturing clothes, they are always seen to be the first to recreate a celebrity look but for a suitable price for its audience. The brand is also recognised as one which works closely with fashion bloggers and social media influencers. This is where the brand is different to Burberry as although they do have celebrity appearances they focus more on the realistic and more relatable bloggers and influencers. Over the past couple of years, I would say that brands and fashion houses have realized the impact and importance of social media when advertising their brand and products. Therefore, brands utilise the connections they have with reality stars, bloggers and social influencers and use it to their advantage by using them to promote the products and almost make them the fact of their brand. âThe importance of fashion branding on social media is becoming even more pronounced as networks like Instagram are revolutionising this field.â (Carter-Marley, 2015.) In terms of the market this strategy this will increase sales to the brands target audience, social media helps this happen because they can approach the right people. An example of this happening recently is after the ITV program âLove Islandâ where everyone who was on the program came out with a huge following and were not long after, seen to be promoting all sorts of brands all over their Instagram. The importance of building relationships with a target audience is so that they can remain loyal and will always be interested in the latest products that are available. Using the participants from a program like love island to promote products is a clever idea because they know that>
 
 
 Overall, high-end brands have also realised the importance of connecting with influential people in order to relate to their audience in a personal way. The only difference is that high-end brands like Burberry use relatable celebrities rather than bloggers and social media influencers. I will now be comparing the use of social media with a high-end brand and a high-street brand, in this chapter, to see how they are similar or what differences there are.  CHAPTER 3: Pretty Little Thing. Pretty little thing is an online clothing website that is known for its affordable and fashionable clothing. The site is known for its fast fashion and trend capturing clothes, they are always seen to be the first to recreate a celebrity look but for a suitable price for its audience. The brand is also recognised as one which works closely with fashion bloggers and social media influencers. This is where the brand is different to Burberry as although they do have celebrity appearances they focus more on the realistic and more relatable bloggers and influencers. Over the past couple of years, I would say that brands and fashion houses have realized the impact and importance of social media when advertising their brand and products. Therefore, brands utilise the connections they have with reality stars, bloggers and social influencers and use it to their advantage by using them to promote the products and almost make them the fact of their brand. âThe importance of fashion branding on social media is becoming even more pronounced as networks like Instagram are revolutionising this field.â (Carter-Marley, 2015.) In terms of the market this strategy this will increase sales to the brands target audience, social media helps this happen because they can approach the right people. An example of this happening recently is after the ITV program âLove Islandâ where everyone who was on the program came out with a huge following and were not long after, seen to be promoting all sorts of brands all over their Instagram. The importance of building relationships with a target audience is so that they can remain loyal and will always be interested in the latest products that are available. Using the participants from a program like love island to promote products is a clever idea because they know that>