Trong đợt tuyển sinh sau đại học 2018-2019, Khoa Toán - Cơ - Tin học có 02 suất học bổng từ Tập đoàn FPT dành cho nghiên cứu sinh (NCS) chuyên ngành Cơ sở toán học cho Tin học cho việc thực hiện hai trong ba đề tài thuộc lĩnh vực Trí tuệ nhân tạo dưới đây.
Quyền lợi học bổng:
- NCS được cấp học bổng ba năm, ở mức 15.000.000 VND (mười lăm triệu đồng) mỗi tháng cho năm đầu tiên. Hai năm tiếp theo có thể được nâng mức học bổng tùy theo chất lượng công việc;
- NCS được tài trợ kinh phí đi báo cáo tại các hội nghị chuyên ngành quốc tế có uy tín;
- NCS được cung cấp các trang thiết bị công nghệ tiên tiến cần thiết cho nghiên cứu.
Nghĩa vụ:
- NCS thực hiện đầy đủ các quy định về đào tạo tiến sĩ của Đại học Quốc gia Hà Nội, chấp hành các yêu cầu đối với NCS của Trường Đại học Khoa học Tự nhiên và Khoa Toán - Cơ - Tin học;
- Ngoài khoảng thời gian do Trường Đại học Khoa học Tự nhiên quản lí, NCS làm việc toàn thời gian tại Ban Công nghệ Tập đoàn FPT;
- NCS hoàn thành nhiệm vụ nghiên cứu theo yêu cầu đề tài, dưới sự hướng dẫn và chỉ đạo của tập thể hướng dẫn (gồm một thành viên do Ban Công nghệ FPT phân công và một thành viên của Trường Đại học Khoa học Tự nhiên).
Cách thức thi tuyển:
- NCS chuẩn bị đề cương nghiên cứu theo một trong ba đề tài dưới đây và nộp hồ sơ dự thi sau đại học chuyên ngành Cơ sở toán học cho Tin học theo quy định của Trường Đại học Khoa học Tự nhiên (có thể nộp hồ sơ đợt 1 hoặc đợt 2, thông tin xem tại đây);
- Sau khi nộp hồ sơ, Khoa Toán - Cơ - Tin học và Ban Công nghệ Tập đoàn FPT sẽ phỏng vấn trực tiếp ứng viên và lựa chọn 02 ứng viên xuất sắc nhất để trao học bổng.
Sau đây là nội dung ba đề tài được đề xuất.
1. A study of Abstract Meaning Representation for building Vietnamese Semantic Bank and its applications in Artificial Intelligence
Introduction. AMR (Abstract Meaning Representation) [1] is a semantic representation language, designed initially together with an English semantic bank distributed via the LDC (Linguistic Data Consortium) catalog. Since the first publication on AMR in 2013, several related works for English as well as many other languages have been published [2].
In natural language processing (NLP), annotated corpora are essential linguistic resources for the development of automatic text analysis systems. AMR bank is a corpus annotated at the semantic level with logical meanings, the latest resource in a suite of annotated corpora (POS tagged, treebank, propbank) and lexicons containing linguistic information at different levels: morphological, syntactic and semantic. AMR corpora make use of framesets from a lexicon for building propbank.
AMR bank is developed for applications related to natural language understanding and generation like chatbot system. AMR corpora are useful resource for information extraction, entity extraction, semantic role labeling, coreference resolution, etc.
Objectives.
- Study and proposition of a framework for building a semantic bank for Vietnamese, including not only an annotated corpus but also a frameset lexicon. The corpus should be annotated using AMR model designed specifically for Vietnamese language, in assuring its compatibility with AMR models for other languages.
- Application of AMR corpus in chatbot systems developed by FPT Corporation.
References
[1] L. Banarescu, C. Bonial, S. Cai, M. Georgescu, K. Griffitt, U. Hermjakob, K. Knight, P. Koehn, M. Palmer, and N. Schneider, Abstract Meaning Representation for Sembanking, Proc. Linguistic Annotation Workshop, 2013.
[2] https://amr.isi.edu/research.html
Requirements. The ideal candidates should have
- Experience in natural language processing and machine learning/data mining. Experience in linguistic resource construction is a plus;
- Good programming skills;
- Solid background in discrete mathematics;
- Ability to work both independently and in collaborative environment.
Contacts. Nguyễn Thị Minh Huyền (huyenntm (at) hus.edu.vn), Lê Hồng Phương (phuonglh (at) vnu.edu.vn)
2. Dialogue Management Models for Tasks-Oriented Dialogue in Intelligent Dialogue Agents
Introduction. Intelligent dialogue systems allow users to interact naturally with information systems in a natural language, either by using speech or natural text. With the growth of intelligent dialogue agents (IDA), the demand for practical IDAs have been increasing at a fast pace. One core task in building IDAs is dialogue management, which models system's action in response to a given user input.
The main topic of this research is to study, implement and evaluate different data-driven approaches to dialogue management. In particular, complex tasks-oriented dialogue management models, which utilize information from dialogue streams and a task stream to create spontaneous interventions for users will be investigated. Prominent approaches include (1) reinforcement learning using Markov decision process for learning dialogue policies from corpora, and (2) dialogue act classification for determining system dialogue moves.
One of relevant references to this topic is the below.
Eun Young Ha, Christopher M. Mitchell, Kristy Elizabeth Boyer, and James C. Lester, Learning Dialogue Management Models for Task-Oriented Dialogue with Parallel Dialogue and Task Streams, Proceedings of the SIGDIAL 2013 Conference, pages 204–213, Metz, France, 2013.
Objectives. The objectives of this research include:
- Study and analyse existing approaches for data-driven dialogue management models;
- Adapt, implement and evaluates known approaches on real-world dialogue IDAs developed by FPT Corp. and partners;
- Develop new approaches and models suitable for given tasks.
Requirements. The ideal candidates should have
- Experience (at master level or equivalent) with Human-Robot Interaction, AI, Machine Learning, Natural Language Processing or related research areas
- Software development in Java and/or Scala and/or C++ and/or Python
- Collaborative research experience
- Prior research in the field of HRI
- Research experience with natural language processing methods in HRI
Contacts. Lê Hồng Phương (phuonglh (at) vnu.edu.vn), Đặng Hoàng Vũ (vudh5 (at) fpt.com.vn)
3. Representation learning for multimedia data
Introduction. The performance of machine learning models is mostly determined by the computing infrastructures and the data representation methods. As computing technologies have been extremely advanced, representation remains as the key for ML. Traditionally, data objects' representations (i.e., features) are often hand-crafted based on insights obtained from the data. While these features are well motivated and carefully designed, they require prior knowledge of the application domains. Moreover, their performance is limited by the incompleteness of the hand-crafted features. Recent works on learning based representations have shown that such the representations may significantly improve the performance of ML models in various tasks [1] while do not require much domain knowledge [2]. Efficient methods for representation learning have therefore been increasingly important for many ML applications.
Existing representation learning methods are based on either some statistical model or some multi-layer network (deep learning). The formers are well studied, elegant, and interpretable, while the latters often obtain better results but are hard to interpret. Moreover, these existing methods are mostly specific for some single type of data [1]. They therefore do not fully leverage all information about objects in real life, which is often provided from multiple data sources, and in different media (e.g, text, image, or transactions). Recent works have suggested that, by combining representations learnt from different data types, one can significantly improve the performance in various tasks, e.g., image tagging and recognition, document analysis, and recommendation, etc. [3, 4]. However, the representation learning for multimedia data is still a challenging problem.
Objectives. In this research, we would like to investigate, adapt, and extent the state-of-the-art of representation learning for multimedia data. We aim to propose efficient methods to learn objects' unified representations from its heterogeneous data sources. The proposed methods should combine the advantages of both the statistical approaches and the multi-layer network based approaches. We would also want to deploy the proposed methods in applications of computer vision and natural language understanding.
References
[1] Bengio, Yoshua, Aaron Courville, and Pascal Vincent. "Representation learning: A review and new perspectives." IEEE transactions on pattern analysis and machine intelligence 35.8 (2013): 1798-1828.
[2] Collobert, Ronan, et al. "Natural language processing (almost) from scratch." Journal of Machine Learning Research 12.Aug (2011): 2493-2537.
[3] Connecting Images and Natural Language, Andrej Karpathy, PhD Thesis, Stanford University, 2016
[4] Wang, Hao, Naiyan Wang, and Dit-Yan Yeung. "Collaborative deep learning for recommender systems." Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015.
Requirements. The ideal candidates should have
- Experience in data mining, machine learning, or related areas
- Excellent programming skills
- Strong motivation and high commitment to work
- Ability to work both independently and in collaborative environment
- Research experience in image processing, computer vision and text mining is a plus
Contacts. Đỗ Thanh Hà (hadt_tct (at) vnu.edu.vn), Đặng Hoàng Vũ (vudh5 (at) fpt.com.vn)