Ah finally. A master plan for this master project involving my two loves in computer science: machine learning and software development.
I’m gonna be a bit extra and create a requirements document-type write-up, i.e. this blog post, because - I don’t know - I enjoy all parts of software engineering and that includes requirements analysis. But I won’t get too deep into that because then I’ll get PTSD of my second year in uni…
The moment of truth. Finally a concrete project description, as brought about by my discussions with the team. This is something that is actually a service that is pending to be developed (I believe). The service involves utilising image models that are currently being developed/fine-tuned in order to create an API to query based on recognition of certain aspects of a hotel. For example, take “infinity pool”. This will be used as an input for the model to choose the best photos that relate to “infinity pool” along with the hotels associated with it.
This, of course, involves training the model for object recognition and the likes along with creating a robust backend API to be able to support requests for objects in the photo and the relevant responses. I’ll probably spend the most time with creating a clean, adaptable backend but will also create some sort of interface. Complexity of any of those aspects depends on achievement of the main goals.
Much of how data is being used is arguably unknown to many people, even those in our company. What do we do with customer reviews and photos and complaints? This is a bit of a large question mark despite these things being more or less integrated into the actual HolidayCheck website. What I want to do is to make their work more visible.
These are basically people/organisations/etc. that have an influence in or are affected in some way by the software.
One of the obvious groups of people are the developers working on this project, i.e. me and whomever helps me do the model (more probable than not the data team). We implement the software as well as have an influence on how we implement it. Well, not really only a small influence but a fairly large one. Someone else whom has a large influence on how the software will turn out is the head of data intelligence. Although not a part of the implementation process per say, he suggests requirements that we should meet.
There’s actually the company as a whole as well, as this is a product that I would eventually like to deploy into production, should it be deemed useful for the end users, which is yet another group of stakeholders. They are arguably most affected by this product as they are the ones whom we would like to benefit from this software.
The end-users are mostly what I will be focusing on in terms of what they want/expect out of this product and for what reason. How will this piece of software benefit them?
Who are the end-users? Probably in terms of the API, other developers… But in the future a proper UI (and not the amateur one I’ll make) will be created and it will be used by the actual customers. Hopefully. If it’s good and people deem it necessary.
I’m probably missing some, but these are the main ones.
So far, here are the tasks that I need to complete. The ones marked with a ++ means that they are in collaboration with the data team. Otherwise, it’s my own work (except for probably when I need a bit of help here and there with Angular).
- Determine list of most important/used USPs that can be recognized from images ++
- Collect training data and train model with collected data ++
- REST API (Backend)
- GET: USP
- POST: best n images based on USP
- UI (frontend)
- Dropdown component
- Images container
Of course there are subtasks that are both a part of the main goal as well as some stretch goals, especially in regards to the model and UI, such as the amount of images that can be supported and the number of components, respectively. These are lightly documented in the following section.
- User must be able to choose from predefined USPs in a dropdown
- There must be at least 5 predefined USPs
- User must receive at least 5 images related to the category
- Stretch at least 10 images
- Stretch Images must be ordered by aesthetic score
- Each image must be linked to the relevant hotel source
- Service must be adaptable and scalable for future development
- Service must be able to handle light traffic
- Service must have acceptable runtime as to reduce the waiting time
I know the non-functional requirements are a bit vague but I honestly don’t know the exact numbers and it’s probably not necessary at this point.
To get more specific, I’ve listed some requirements (both functional and non-) of the REST API: - REST API must: - communicate with the model through JSON - perform a GET request upon user input from the dropdown - input the USP into the model - Stretch check first if USP results are already in database - perform a POST request including chosen images and hotel sources
And truth be told, these requirements maybe be adapted or more will be added as the project goes on (especially in terms of the API), because in the real world requirements change all the time depending on the rise of problems, the depletion of resources, an unreachable goal, or etc. etc. Nonetheless, I will keep it all updated.
Well, for one thing, training the model could take some time, as well as even gathering the necessary training data for the categories. Another question is how many images is too many images? As in, how many images can be supported by the program before the runtime suffers? But maybe this isn’t even a problem… Though definitely training the model to a point where it is “precise enough” may take some time.
I tell myself that given six weeks, I don’t need to have the perfect model, especially when it needs to be trained on different USPs so it’s a semi non-risk risk. I guess more of the risk in this aspect comes from depending on another team and the possibility of them suddenly becoming unavailable. And I doubt I would be able to complete this without their help and input.
Another risk arises from not knowing the difficulty of integrating all of these different aspects of the project, for instance the model and the hotel API and so on and so on. These are things that I won’t know until I start working on it, which is why I said the requirements may change, as it’s difficult to determine how long each task will take me. How do I get around this? Start working as soon as possible.